Tuesday, April 8, 2014

Nested Aggregations

I was looking at aggregations today and found that aggregations had the same problem with grouping as doing an and between two fields in a sub document. This makes sense given the way Lucene stores multi field documents. When stored without nested it will group all values for a field together. I will walk through how to avoid some problems that can arise.

First lets set-up our data.

Create Schema with nested Response Index some nested documents Index some documents with a default mapping Query for aggregations with the default mapping Response We can see that the numbers are way too high. There is only one dogwood tree in all the parks and it says the total height is 179. This because when it groups on dogwood, it cannot tell what height belongs to it. So for dogwood it gives the total for all the trees in the park.

To avoid this we need to make the tree type nested.
Query for aggregations with nested Response As you can see when I query on the nested index I get the expected counts.

Below I show a limitation of nest when doing aggregations. Since nested creates another Lucene under the covers it cannot reach back out of that Lucene index to aggregate data.

Query for aggregations with nested but, will reach outside the sub document Response There are no results due to the field being summed not being found. To work around this the "terms" field would have to be moved out of the nested document. Alternatively, you can push the data being summed into each of nested documents.

No comments:

Post a Comment