1

I thought this would be simple, but it is turning out to be quite complicated.

We want to be able to extract from our ElasticSearch instance empty and not empty fields. Strings cause the problem. My definitions of empty or not empty are:

Empty

  • It does not exist.
  • It does exist but the value is NULL or an empty string (for strings).

Not empty

  • It does exist.
  • It has a value that is not NULL or empty string (for strings).

And I have read about different ways to proceed, and all of them seem to involve a bit of complexity. The old missing filter, using a script portion on the query to compare with length 0, using term, etc. Implementing a should_not to mimic the logic described before does not seem to work either in my version.

Ideally, it would be fantastic if we could use the exists operator everywhere, as it could be used with all the types we have, date, integers, strings, etc.

There is something that I was assuming but that does not seem to be true at least in my case (using ElasticSearch 5.5.0):

"Elasticsearch does not index empty strings"

My understanding is that if this was true, we could use exists on that string field too. The queries are generated automatically by a module we wrote, so a simpler query would also simplify the coding of the new functionality. The same operator would be used in all cases.

I have tried to add keywords as a plain field:

...

:field                {:type "keyword"}

...

And also nested:

{:type     "text"
 :analyzer "standard"
 :fields   {:raw        {:type "keyword"}}}

But nothing seems to work:

{
  "query": {
    "bool": {
      "must_not": [
       {
         "exists" : { "field.raw" : "x" }
       }
      ...
      ...
  ],

All empty strings are detected as if they existed. Is there any change that we could implement to enable that?.

2 Answers 2

4

Empty string such as "" is considered as field exists. To identify if the field is empty as per your definition you can use the query as below:

{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must_not": [
              {
                "exists": {
                  "field": "someField"
                }
              }
            ]
          }
        },
        {
          "term": {
            "someField": ""
          }
        }
      ]
    }
  }
}

Replace someField in above query by the name of the actual field in your index.

Sign up to request clarification or add additional context in comments.

Comments

0

It's also ok to use query_string:

"query_string": { "query": "someField":\"\"" }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.