Tuesday, October 20, 2009

Google Search Appliance (GSA) Sorting in Portal

At several of our clients, we have integrated the Google Search appliance into a Portal. In order to accomplish this integration we could take 1 of 2 approaches:

1. Utilize GSA’s built-in ability to format the presentation logic via a XLST.

2. Utilize GSA’s ability to return straight XML.

Both approaches work well and can suit the needs of a portal. Option 1 though will not work if you need to sort the entire result set prior to displaying it to the users. The reasons for this is as follows:

1. GSA does not provide the ability to retrieve more than 100 results at a time

2. GSA’s built in sorting only sorts the first 100 results.

3. Sorting on things other than Date or Relevance [e.g. Meta Data] requires some XSLT work and it is still bound by the limitations of only sorting the 100 records at a time.

Option 2 still has the limitation of fetching 100 records at a time, but you can sort it client side as requirements dictate. Our approach to accomplishing this typically involves the following:

1. Creating client side code that dynamically fetches the entire result from GSA by fetching blocks of 100 results at a time up to the maximum available.

2. Store the resulting composite XML in a cached region for a predetermined amount of time. The caching algorithm for the key and time should be configurable so that it can be adjusted as needed.

3. After fetching and storing the results, sort them based upon the client input.


Overall Option 2 worked very well for us when the sorting requirements exceed those available to you by the built-in mechanisms provided by GSA. The one challenge to keep in mind is the memory requirements needed for caching and the time required to fetch the results in chunks. In both cases, we found that the memory requirements rarely had an adverse impact on our portal and the fetch time was only incurred by the first requestor and was rarely noticeable.