4

When I do this request, I can get mappings of the index:

GET /users

And it returns this:

{
   "user":{
      "mappings":{
         "skill":{
            "properties":{
               ...
               "Rouge":{
                  "type":"float"
               },
               "Ruby":{
                  "type":"float"
               },
               "Rust":{
                  "type":"float"
               },
               "SAS":{
                  "type":"float"
               },
               "SASS":{
                  "type":"float"
               },
               "SCSS":{
                  "type":"float"
               },
               ...
               "settings":{
                  "index":{
                     "creation_date":"1584415338201",
                     "number_of_shards":"5",
                     "number_of_replicas":"0",
                     "provided_name":"user"
                  }
               }
            }
         }
      }
   }
}

The problem is, some of the fields are empty and I want to find them.

For example no one has a value for Ruby. If I can specifically search for a field to see if it is empty/null I can do it, but I need a query to find all the empty fields, which unfortunately I couldn't find anything online.

Of course I can get all the fields and run an empty-check query on all of them, but it is probably a bad idea. Do you know how to do it better?

I am using version 6.8.

2
  • Are you looking for fields that are always empty (never used)? So Ruby never has a value for any document it counts, but if Rouge had a value for one doc it doesn't count Commented May 27, 2020 at 14:44
  • 1
    I was expecting to see Ruby in the results if there are no documents with Ruby value regardless of other fields. Assume that I have 3 fields (let's say A, B and C) and there are 3 docs, 1 has A and B value, 1 has only A and 1 has only B value. In this case I expect to see C in the result, because no document has a value for C. Commented May 27, 2020 at 14:50

1 Answer 1

1

I think you can use aggregations to achieve that, I know it's not a straightforward solution and you need to write all field names; but it can be helpful.

GET users/_search
{
  "size": 0,
  "aggs": {
    "Rouge": {
      "value_count": {
        "field": "Rouge"
      }
    },
    "Ruby": {
      "value_count": {
        "field": "Ruby"
      }
    },
    "Rust": {
      "value_count": {
        "field": "Rust"
      }
    },
    "SAS": {
      "value_count": {
        "field": "SAS"
      }
    },
    "SASS": {
      "value_count": {
        "field": "SASS"
      }
    },
    "SCSS": {
      "value_count": {
        "field": "SCSS"
      }
    }
  }
}

If one of them has no value, it'll appear as "value": 0, like:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "Ruby": {
      "value": 0
    }
  }
}
Sign up to request clarification or add additional context in comments.

1 Comment

It's working but unfortunately it's pretty slow. I have 350 fields and 7 million documents and query takes around 20 seconds.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.