13

When sorting on a string field with multiple words, Elasticsearch is splitting the string value and using the min or max as the sort value. I.E.: when sorting on a field with the value "Eye of the Tiger" in ascending order, the sort value is: "Eye" and when sorting in descending order the value is: "Tiger".

Lets say I have "Eye of the Tiger" and "Wheel of Death" as entries in my index, when I do an ascending sort on this field, I would expect, "Eye of the Tiger" to be first, since "E" comes before "W", but what I'm seeing when sorting on this field, "Wheel of Death" is coming up first, since "D" is the min value of that term and "E" is the min value of "Eye of the Tiger".

Does anyone know how to turn off this behavior and just allow a regular sort on this string field?

3
  • how is that field mapped? Sounds like it is tokenized into pieces, not analyzed as a whole string. You might need "index": "not_analyzed" Commented Jan 27, 2014 at 19:51
  • The field is mapped as a string. So I would basically need to do that for every field that I wanted to sort on that contained multiple terms? I was doing some more digging and came across this one: stackoverflow.com/questions/10583013/… which sounds similar to what your suggesting. Is this the only way in Elasticsearch? it just feels pretty clunky Commented Jan 27, 2014 at 20:03
  • 1
    Here is a helpful blog post related to the topic awesomism.co.uk/sorting-string-fields-with-elasticsearch Commented Aug 5, 2014 at 16:55

2 Answers 2

10

As mconlin mentioned if you want to sort on the unanalyzed doc field you need to specify "index": "not_analyzed" to sort as you described. But if you're looking to be able to keep this field tokenized to search on, this post by sloan shows a great example. Using multi-field to keep two different mappings for a field is very common in Elasticsearch.

Hope this helps, let me know if I can offer more explanation.

Sign up to request clarification or add additional context in comments.

Comments

4

If you want the sorting to be case-insensitive "index": "not_analyzed" doesn't work, so I've created a custom sort analyzer.

index-settings.yml

index :   
    analysis :
        analyzer :
            sort :
                type : custom
                tokenizer : keyword
                filter : [lowercase]

Mapping:

...
"articleName": {
    "type": "string",
    "analyzer": "standard",
    "fields": {
        "sort": {
            "type": "string",
            "analyzer": "sort"
        }
    }
}
...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.