My Life With The Google Search Appliance

February 29, 2016

Reflections after 11 years with Google Search Appliance

After 13 years of the Google Search Appliance, Google announced last week that it would be winding down sales of the product over the next few years, effectively setting an end date to the product. From its start it was a novel concept: “google.com on a box for your internal websites”. That idea grew in capabilities, eventually reaching into the Gartner Magic Quadrant.

I was first introduced to the GSA back when Google approached MC+A to develop a SharePoint connector for the device. It was the launch of the Google Enterprise Partner program for which we were the the first Partner. Back then, there was no feed port so we developed a site map and means by which the GSA could crawl Sharepoint. (This method is also actually the later design pattern of the Plexi Framework [See Graph Traversal in the Plexi Developer’s Guide Link ).

Since then, I’ve deployed the GSA over 200 times to customers, developed 12 commercial connectors for the GSA, presented dozens of times to customers, partners and at Google. Lots has changed and I thought to share some things that I learned thus far:

The box is going away but the concept is not

Software as a service is a growing trend whether it is delivered by an appliance or a cloud infrastructure. The appliance can only host a finite amount of computing and requires a great deal of logistics to ship around the world. A cloud infrastructure can scale to meet the peak demands that search tends to need. You can not simply do this in a 2U form factor.

Great search tool isn’t much good without content

The GSA had a web crawler, db crawler and file crawler. Enterprise content does not reside in these type of systems mostly. Getting the content into the search system in a timely manner is generally important. To that end, MC+A has developed and continues to develop connectors to search systems.

API, security and updates continue to be the ingestion challenge

Not related to the form factor, cloud content systems are very frugal when it comes to getting out your data. For example, Egynte has a limit of 1000 api calls per day, no joke. If you have a repository that has a million documents that’s going to take a while to ingest. Need to change a indexing setting, you’ll need to redownload or maintain a cache. Change a security setting in the repo…and update….and so on…and you’re out of apis.

Very few people care about search, fewer can get an initiative funded

People want it to just work.

Big data came from enterprise search. Map Reduce and the Google File system came from the need to store and index billions of documents. It only makes since that we leverage cloud infrastructure to do this like Google does.

My company MC+A also wrote a short article on the subject as well. I am very much looking forward to the next three years to see how this evolves.