0

I need to prevent certain fields which have values like "null" (null as a string) and ""(empty string) from getting indexed in Elasticsearch i.e. I should be able to fetch rest fields in the document except fields with such values in _source. I am using normalizer as below

{
"analysis": {
    "normalizer": {
        "my_normalizer": {
            "filter": [
                "uppercase"
            ],
            "type": "custom"
        }
    }
}

}

Are there any settings required above or in field mappings?

P.S:- I am using elasticsearch 7.6.1

1

1 Answer 1

1

You can have a look to Elasticsearch Pipelines. They are applied before indexing (and in your case analyzing) take place.

Concretely, you could add an Elasticsearch Pipeline that removes the required fields if they meet the conditions you listed. Something like:

PUT _ingest/pipeline/remove_invalid_value
{
   "description": "my pipeline that removes empty string and null strings",
   "processors": [
       { 
          "remove": {
              "field": "field1",
              "ignore_missing": true,
              "if": "ctx.field1 == \"null\" || ctx.field1 == \"\""
          }
       },
        { 
          "remove": {
              "field": "field2",
              "ignore_missing": true,
              "if": "ctx.field2 == \"null\" || ctx.field2 == \"\""
          }
       },
       
        { 
          "remove": {
              "field": "field3",
              "ignore_missing": true,
              "if": "ctx.field3 == \"null\" || ctx.field3 == \"\""
          }
       }
   ]
}

Then, you can either specify the pipeline in the index request or by putting it as the default_pipeline or final_pipeline in your index settings. You can also specify this setting in the index template.

(Script) Loop Approach

If you don't want to write a long list of remove actions, you can try to use a script processor, something like this:

PUT _ingest/pipeline/remove_invalid_fields
{
  "description": "remove fields",
  "processors": [
    {
      "script": {
        "source": """
          for (x in params.to_delete_on_condition) {
                if (ctx[x] == "null" || ctx[x] == "") {
                    ctx.remove(x);
                }
          }
          """,
        "params": {
          "to_delete_on_condition": [
            "field1",
            "field2",
            "field3"
          ]
        }
      }
    }
  ]
}

It iterates over the list and removes the field if the condition matches.

Accessing nested fields in scripts is not trivial as reported in many answer, but it should be doable. The idea is that nested.field should be accessed as ctx['nested']['field'].

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the response@Briomkez. I have multiple fields in the index. So is there a way to apply this processors to all the fields in document.
Checkout my edit. I tried to address the question of your comment :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.