You can have a look to Elasticsearch Pipelines. They are applied before indexing (and in your case analyzing) take place.
Concretely, you could add an Elasticsearch Pipeline that removes the required fields if they meet the conditions you listed. Something like:
PUT _ingest/pipeline/remove_invalid_value
{
"description": "my pipeline that removes empty string and null strings",
"processors": [
{
"remove": {
"field": "field1",
"ignore_missing": true,
"if": "ctx.field1 == \"null\" || ctx.field1 == \"\""
}
},
{
"remove": {
"field": "field2",
"ignore_missing": true,
"if": "ctx.field2 == \"null\" || ctx.field2 == \"\""
}
},
{
"remove": {
"field": "field3",
"ignore_missing": true,
"if": "ctx.field3 == \"null\" || ctx.field3 == \"\""
}
}
]
}
Then, you can either specify the pipeline in the index request or by putting it as the default_pipeline or final_pipeline in your index settings. You can also specify this setting in the index template.
(Script) Loop Approach
If you don't want to write a long list of remove actions, you can try to use a script processor, something like this:
PUT _ingest/pipeline/remove_invalid_fields
{
"description": "remove fields",
"processors": [
{
"script": {
"source": """
for (x in params.to_delete_on_condition) {
if (ctx[x] == "null" || ctx[x] == "") {
ctx.remove(x);
}
}
""",
"params": {
"to_delete_on_condition": [
"field1",
"field2",
"field3"
]
}
}
}
]
}
It iterates over the list and removes the field if the condition matches.
Accessing nested fields in scripts is not trivial as reported in many answer, but it should be doable. The idea is that nested.field should be accessed as ctx['nested']['field'].