22

Suppose, in ElasticSearch 5, I have data with nesting like:

{"number":1234, "names": [ 
  {"firstName": "John", "lastName": "Smith"}, 
  {"firstName": "Al", "lastName": "Jones"}
]},  
...

And I want to query for hits with number 1234 but return only the names that match "lastName": "Jones", so that my result omits names that don't match. In other words, I want to get back only part of the matching document, based on a term query or similar.

A simple nested query won't do, as such would be filtering top-level results. Any ideas?

{ "query" : { "bool": { "filter":[
    { "term": { "number":1234} },
    ????  something with "lastName": "Jones" ????
] } } }

I want back:

hits: [
   {"number":1234, "names": [ 
     {"firstName": "Al", "lastName": "Jones"}
   ]},  
   ...
]
5
  • 1
    The second answer should get you what you need, right? Commented Aug 11, 2017 at 4:31
  • did you find a good solution for your purpose? accepted answer doesnt seem to be solution as you also commented down below. I also need exactly same filtering on nested objects but those inner hits are returned separately and entire nested object list is also returned. Is it maybe even not possible with nested objects? did you end up parent-child ? see my questions also here stackoverflow.com/questions/48750696/… Commented Feb 12, 2018 at 17:20
  • I did not find exactly what I wanted. If I were in charge of elastic, I'd probably add this feature! Commented Feb 12, 2018 at 22:32
  • @Phillip Baumann's answer is what you need. Please check it and my comment. Commented May 21, 2020 at 20:24
  • @Algorini Please read comment of Emil above, if I have multiple nested query with must operator for same array in es document, inner hits are returned separately, how to get common combined inner hits. Please help in this, if you need more details, please let me know. Commented Feb 8, 2022 at 10:06

3 Answers 3

32
+50

hits section returns a _source - this is exactly the same document you have indexed.

You are right, nested query filters top-level results, but with inner_hits it will show you which inner nested objects caused these top-level documents to be returned, and this is exactly what you need.

names field can be excluded from top-level hits using _source parameter.

{
   "_source": {
      "excludes": ["names"]
   },
   "query":{
      "bool":{
         "must":[
            {
               "term":{
                  "number":{
                     "value":"1234"
                  }
               }
            },
            {
               "nested":{
                  "path":"names",
                  "query":{
                     "term":{
                        "names.lastName":"Jones"
                     }
                  },
                  "inner_hits":{
                  }
               }
            }
         ]
      }
   }
}

So now top-level documents are returned without names field, and you have an additional inner_hits section with the names that match.
You should treat nested objects as part of a top-level document. If you really need them to be separate - consider parent/child relations.

Sign up to request clarification or add additional context in comments.

2 Comments

This answer is wrong, for the simple reason that the _source field of the top-level hit will still include all the nested documents. Yes, you will get matching nested documents in the inner_hits, but your response will bloat, as it still carries all the nested hits in the original _source field. The right method is to exclude "names" by source_filtering, and use inner_hits to keep only matching nested docs. See @Phillip Baumann's answer below for the correct query.
@user2076066 Good catch, I've updated my answer with source filtering.
5

Similar but a bit different, use the should parameter and then look at inner hits for the names. This will return the top level doc and then inner_hits will have any hits.

   { 
      "_source": {
        "excludes": ["names"]
      },
       "query":{
          "bool":{
             "must":[
                {
                   "term":{
                      "number":{
                         "value":"1234"
                      }
                   }
                }
             ],
             should: [
             {
                "nested":{
                   "path":"names",
                   "query":{
                      "term":{
                         "names.lastName":"Jones"
                      }
                   },
                   "inner_hits":{
                   }
                }
             }

             ]
          }
       }
    }

2 Comments

This is the right answer. The reason is that it excludes the nested "names" field, and inside the inner hits will only show the matching inner hits. It means you need to reassemble your document a bit (client-side). In the answer chosen as "correct", the original purpose is defeated, as the _source of the top level hit, will still carry all the nested documents, and your response in fact bloats even more.
The part with exclusion is correct, although the nested filter has to be in must section, not in should one.
4

Try something like this

{
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  {
                     { "term": { "number":1234} }
                  },
                  {
                     "nested": {
                        "path": "something",
                        "query": {
                           "term": {
                              "something.lastName": "Jones"
                           }
                        },
                        "inner_hits" : {}
                     }
                  }
               ]
            }
         }
      }
   }
}

I used this Refrence

3 Comments

Good try, but that filters the documents based on nested content, thus I still get both names in my result. (Also, you forgot a intermediate "query" : { "bool": { inside the nested.) I want to filter within the document so that I get only one name for number 1234.
Running it as you wrote it results in parsing_exception no [query] registered for [filtered] on line 3. I'm not sure how to fix your query to try it. I'm guessing you forgot a intermediate "query" : { "bool": { inside the "nested". I also think you don't need "filtered", nor would you want a "filter" outside the "bool". Keep trying and thank you!
Reading the inner_hits docs and trying it, I don't think it would help me. inner_hits could give you, in a separate section, different ways of paging or summarizing the inner elements, but I want to filter some of them out of the results altogether. Correct me if I'm wrong. See also: elastic.co/guide/en/elasticsearch/reference/current/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.