6

I am facing a trouble in the use of ElasticSearch for my java application. I explain myself, I have a mapping, which is something like :

{
"products": {
    "properties": {
        "id": {
            "type": "long",
                   "ignore_malformed": false
        },
        "locations": {
            "properties": {
                "category": {
                    "type": "long",
                   "ignore_malformed": false
                },
                "subCategory": {
                    "type": "long",
                   "ignore_malformed": false
                },
                "order": {
                    "type": "long",
                   "ignore_malformed": false
                }
            }
        },
...

So, as you can see, I receive a list of products, which are composed of locations. In my model, this locations are all the categories' product. It means that a product can be in 1 or more categories. In each of this category, the product has an order, which is the order the client wants to show them.

For instance, a diamond product can have a first place in Jewelry, but the third place in Woman (my examples are not so logic ^^). So, when I click on Jewelry, I want to show this products, ordered by the field locations.order in this specific category.

For the moment, when I search all the products on a specific category the response for ElasticSearch that I receive is something like :

{"id":5331880,"locations":[{"category":5322606,"order":1},
{"category":5883712,"subCategory":null,"order":3},
{"category":5322605,"subCategory":6032961,"order":2},.......

Is it possible to sort this products, by the element locations.order for the specific category I am searching for ? For instance, if I am querying the category 5322606, I want the order 1 for this product to be taken.

Thank you very much beforehand ! Regards, Olivier.

2 Answers 2

9

First a correction of terminology: in Elasticsearch, "parent/child" refers to completely separate docs, where the child doc points to the parent doc. Parent and children are stored on the same shard, but they can be updated independently.

With your example above, what you are trying to achieve can be done with nested docs.

Currently, your locations field is of type:"object". This means that the values in each location get flattened to look something like this:

{ 
    "locations.category": [5322606, 5883712, 5322605],
    "locations.subCategory": [6032961],
    "locations.order": [1, 3, 2]
}

In other words, the "sub" fields get flattened into multi-value fields, which is of no use to you, because there is no correlation between category: 5322606 and order: 1.

However, if you change locations to be type:"nested" then internally it will index each location as a separate doc, meaning that each location can be queried independently, using the dedicated nested query and filter.

By default, the nested query will return a _score based upon how well each location matches, but in your case you want to return the highest value of the order field from any matching children. To do this, you'll need to use a custom_score query.

So let's start by creating the index with the appropriate mapping:

curl -XPUT 'http://127.0.0.1:9200/test/?pretty=1'  -d '
{
   "mappings" : {
      "products" : {
         "properties" : {
            "locations" : {
               "type" : "nested",
               "properties" : {
                  "order" : {
                     "type" : "long"
                  },
                  "subCategory" : {
                     "type" : "long"
                  },
                  "category" : {
                     "type" : "long"
                  }
               }
            },
            "id" : {
               "type" : "long"
            }
         }
      }
   }
}
'

The we index your example doc:

curl -XPOST 'http://127.0.0.1:9200/test/products?pretty=1'  -d '
{
   "locations" : [
      {
         "order" : 1,
         "category" : 5322606
      },
      {
         "order" : 3,
         "subCategory" : null,
         "category" : 5883712
      },
      {
         "order" : 2,
         "subCategory" : 6032961,
         "category" : 5322605
      }
   ],
   "id" : 5331880
}
'

And now we can search for it using the queries we discussed above:

curl -XGET 'http://127.0.0.1:9200/test/products/_search?pretty=1'  -d '
{
   "query" : {
      "nested" : {
         "query" : {
            "custom_score" : {
               "script" : "doc[\u0027locations.order\u0027].value",
               "query" : {
                  "constant_score" : {
                     "filter" : {
                        "and" : [
                           {
                              "term" : {
                                 "category" : 5322605
                              }
                           },
                           {
                              "term" : {
                                 "subCategory" : 6032961
                              }
                           }
                        ]
                     }
                  }
               }
            }
         },
         "score_mode" : "max",
         "path" : "locations"
      }
   }
}
'

Note: the single quotes within the script have been escaped as \u0027 to get around shell quoting. The script actually looks like this: "doc['locations.order'].value"

If you look at the _score from the results, you can see that it has used the order value from the matching location:

{
   "hits" : {
      "hits" : [
         {
            "_source" : {
               "locations" : [
                  {
                     "order" : 1,
                     "category" : 5322606
                  },
                  {
                     "order" : 3,
                     "subCategory" : null,
                     "category" : 5883712
                  },
                  {
                     "order" : 2,
                     "subCategory" : 6032961,
                     "category" : 5322605
                  }
               ],
               "id" : 5331880
            },
            "_score" : 2,
            "_index" : "test",
            "_id" : "cXTFUHlGTKi0hKAgUJFcBw",
            "_type" : "products"
         }
      ],
      "max_score" : 2,
      "total" : 1
   },
   "timed_out" : false,
   "_shards" : {
      "failed" : 0,
      "successful" : 5,
      "total" : 5
   },
   "took" : 9
}
Sign up to request clarification or add additional context in comments.

4 Comments

Hello DrTech, Thank you so much for your answer, it is exactly what I was looking for. I modified my mapping and it works fine. However, I do not know how to do the custom_score request with the java API, and there is nothing mentionned about it. Could you help me with that ? Thank you again ! It is really much appreciated. Have a great week end. Olivier
I waited to know if you have some feedbacks on the java API. I just accepted the answer since it is well explained and my understanding is much better now. Do you have some knowledge with the java API ? Thank you very much beforehand.
No. I haven't used the Java API
is it also possible to filter the document for other properties and not the nested properties after the nested filtering?
1

Just add a more updated version related to sorting parent by child field. We can query parent doc type sorted by child field ('count' e.g.) similar as follows.

https://gist.github.com/robinloxley1/7ea7c4f37a3413b1ca16

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.