17

I'm looking for a way to do exact array matches in elastic search. Let's say these are my documents:

{"id": 1, "categories" : ["c", "d"]}
{"id": 2, "categories" : ["b", "c", "d"]}
{"id": 3, "categories" : ["c", "d", "e"]}
{"id": 4, "categories" : ["d"]}
{"id": 5, "categories" : ["c", "d"]}

Is there a way to search for all document's that have exactly the categories "c" and "d" (documents 1 and 5), no more or less?

As a bonus: Searching for "one of these" categories should still be possible as well (for example you could search for "c" and get 1, 2, 3 and 5)

Any clever way to tackle this problem?

2 Answers 2

20

If you have a discrete, known set of categories, you could use a bool query:

"bool" : {
    "must" : {
        "terms" : { "categories" : ["c", "d"],
             minimum_should_match : 2
         }
    },
    "must_not" : {
        "terms" : { "categories" : ["a", "b", "e"],
             minimum_should_match : 1
         }
    }
}

Otherwise, Probably the easiest way to accomplish this, I think, is to store another field serving as a categories keyword.

{"id": 1, "categories" : ["c", "d"], "categorieskey" : "cd"}

Something like that. Then you could easily query with a term query for precisely the results you want, like:

term { "categorieskey" : "cd" }

And you could still search non-exclusively, as;

term { "categories" : "c" }

Querying for two categories that must both be present is easy enough, but then preventing any other potential categories from being present is a bit harder. You could do it, probably. You'dd probably want to write a query to find records with both, then apply a filter to it eliminating any records with categories other than the ones specified. It's not really a sort of search that Lucene is really designed to handle, to my knowledge.

Honestly I'm having a bit of trouble coming up with a good filter to use here. You might need a script filter, or you could filter the results after they have been retrieved.

Sign up to request clarification or add additional context in comments.

7 Comments

funny, that's exactly what i told him :)
This query won't run. minimum_match doesn't appear to be a valid parameter to a TermsFilter.
@Conrad.Dean Who said anything about using a filter?
@femtoRgon oh woops. i was way too zoomed in. the syntax is identical if that's wrapped with a "filter":{...} instead of just a "query":{...}. minimum_should_match is a valid parameter in a terms query elasticsearch.org/guide/en/elasticsearch/reference/current/…
Ah, you're right, probably should be minimum_should_match. Not sure whether that's a change in ElasticSearch, or a mistake, but certainly doesn't hurt to update. Thanks.
|
1

I found a solution for our usage case that appears to work. It relies on two filters and the knowledge of how many categories we want to match against. We make use of a terms filter and a script filter to check the size of the array. In this example, marketBasketList is similar to your categories entry.

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "siteId": 4
          }
        },
        {
          "match": {
            "marketBasketList": {
              "query": [
                10,
                11
              ],
              "operator": "and"
            }
          }
        }
      ]
    },
    "boost": 1,
    "filter": {
      "and": {
        "filters": [
          {
            "script": {
              "script": "doc['marketBasketList'].values.length == 2"
            }
          },
          {
            "terms": {
              "marketBasketList": [
                10,
                11
              ],
              "execution": "and"
            }
          }
        ]
      }
    }
  }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.