Thursday 15 January 2015

Solr "queryResultCache": queryResultWindowSize vs queryResultMaxDocsCached

I did some tests to find out the difference between "queryResultWindowSize" and "queryResultMaxDocsCached"

Example config for the scenarios:

  <queryResultWindowSize>4</queryResultWindowSize>   
  <queryResultMaxDocsCached>16</queryResultMaxDocsCached>

Note: In all the Szenarios I always use the same query but between the scenarios I restarted Solr to flush the cache

-------------------------------------------------------------     
Szenario 1:
We have a page size of 2 and go from one page to the next
------------------------------------------------------------- 
Start:  0 Rows: 2 -> Executes Query, returns docs 0-1 and caches docs 0-3
Start:  2 Rows: 2 -> Retrieves docs 2-3 from cache and returns them
Start:  4 Rows: 2 -> Executes Query, returns docs 4-5 and caches docs 0-7
                                         (Note: It replaces the existing cache entry for this query
                                          -> There is always only one cache entry for a query)
Start:  6 Rows: 2 -> Retrieves docs 6-7 from cache and returns them
Start:  8 Rows: 2 -> Executes Query, returns docs 8-9 and caches docs 0-11
Start: 10 Rows: 2 -> Retrieves docs 10-11 from cache and returns them
Start: 12 Rows: 2 -> Executes Query, returns docs 12-13 and caches docs 0-15
Start: 14 Rows: 2 -> Retrieves docs 14-15 from cache and returns them
Start: 16 Rows: 2 -> Executes Query, returns docs 16-17 and does NOT cache anything because "queryResultMaxDocsCached" setting of 16 has been exceeded

------------------------------------------------------------- 
Szenario 2:
We have a page size of 2 but start with a high "start" parameter
------------------------------------------------------------- 
Start:  8 Rows: 2 -> Executes Query, returns docs 8-9 and caches docs 0-11
Start: 10 Rows: 2 -> Retrieves docs 10-11 from cache and returns them
Start:  0 Rows: 2 -> Retrieves docs 0-1 from cache and returns them
Start:  2 Rows: 2 -> Retrieves docs 2-3 from cache and returns them
Start:  4 Rows: 2 -> Retrieves docs 4-5 from cache and returns them
Start:  6 Rows: 2 -> Retrieves docs 6-7 from cache and returns them
Start: 12 Rows: 2 -> Executes Query, returns docs 12-13 and caches docs 0-15
Start: 14 Rows: 2 -> Retrieves docs 14-15 from cache and returns them
Start: 16 Rows: 2 -> Executes Query, returns docs 16-17 and does NOT cache anything because "queryResultMaxDocsCached" setting of 16 has been exceeded

------------------------------------------------------------- 
Szenario 3:
We start with a high "rows" query parameter
------------------------------------------------------------- 
Start:  0 Rows: 8 -> Executes Query, returns docs 0-7 and caches docs 0-7
Start:  0 Rows: 4 -> Retrieves docs 0-3 from cache and returns them
Start:  6 Rows: 2 -> Retrieves docs 6-7 from cache and returns them
Start:  8 Rows: 2 -> Executes Query, returns docs 8-9 and caches docs 0-11
Start: 10 Rows: 2 -> Retrieves docs 10-11 from cache and returns them

From this I conclude

1) The Solr "queryResultCache" always caches from the first document of the query result - not from the "start" query-parameter

2) "queryResultWindowSize" setting: In our example the "windows" were documents 0-3, 4-7, 8-11, 12-15, ... The "start" + "rows" query-parameters determine which "window" is used and the "window" in turn determines the end document to be cached (the upper end of the window)

3) "queryResultMaxDocsCached" setting: A threshold indicating if the query result should be cached or not. If the end document to be cached (the upper end of the window) is higher than the "queryResultMaxDocsCached" setting the query result will not be cached.  (In my opinion the name of the parameter is very unfortunate)

Here is a suggestion for when your default page size ("rows" query-parameter) is 10:
  • A "queryResultWindowSize" setting of 20 will load the first two pages into the cache when the "start" query-parameter is 0 (or up to including 10).
  • A "queryResultMaxDocsCached" setting of 40 will also allow to cache the third and the fourth page (when the "start" query-parameter is 20 (or up to including 30)). From the fifth page on the query result will not be cached.

I also read (did not verify):

The Solr "queryResultCache" caches the document ids and optionally the scores (if you ask for the scores). That means that either 4 or 8 bytes per document are cached.

No comments:

Post a Comment