String.hashCode in Java [duplicate]

Question

Possible Duplicate:
why do String.hashCode() in java is not implemented in a way with less conflicts?

For non-cryptographic hashes, how does Java's String.hashCode() perform?

Mostly I'm concerned about collisions.

Thanks.

This is probably impossible to answer without you telling us what strings you are interested in. Over 10-char strings, all reasonable 32-bit hashes have the same number of collisions, namely 256^10-256^4. — Pascal Cuoq
– Pascal Cuoq, Commented Jan 5, 2013 at 1:43
As expected, quite well. You can read the source code. Collisions are rare, even when normal ASCII is more 6 bit per byte, and hashCode uses *31. — Joop Eggen
– Joop Eggen, Commented Jan 5, 2013 at 1:44

fge · Accepted Answer · 2013-01-05 02:26:04Z

You seem to be misunderstanding what .hashCode() is for with regards to Java, and more specifically, the .equals()/.hashCode() contract specified by java.lang.Object.

The only part of the contract of matter to anyone is that if two objects are equal with regards to .equals(), then they must have the same hash code as returned by .hashCode(). There is no other obligation to that contract.

It is therefore perfectly legal to write a custom .hashCode() implementation like this, even though this is as suboptimal as one can think of:

@Override
public int hashCode()
{
    // Legal, but useless
    return 42;
}

Of course, JDK developers would never be that thick, and .hashCode() implementations for builtin types (including String) are good enough that you do not even need to worry about collisions. Even then, this implementation will more than likely vary from one JDK implementation to another, and so will its "cryptographic value".

But that's not the point.

The most important thing to consider is that .hashCode() has nothing to do with cryptography at all. Its only obligation is to obey the contract defined by java.lang.Object.

mikera · Accepted Answer · 2013-01-05 01:47:34Z

0

It's pretty good as a general purpose hash function. i.e. you shouldn't usually worry about it.

In particular:

It is fast, to the extent that it probably produces hashes as the CPU can read the String from memory (i.e. you usually can't get better without skipping large parts of the String). It does just one multiply and one add per character in the String.
For typical sets of random Strings, it produces well-distributed hashes over the entire int range.

Obviously, it is not a cryptographic hash function, so don't use it for that. Also, be aware that you likely will get hash collisions as it is producing a 32-bit hash. So you just need to design your algorithms to take that into account.

answered Jan 5, 2013 at 1:47

mikera

107k28 gold badges265 silver badges427 bronze badges

3 Comments

om-nom-nom Over a year ago

Don't think I'm going to bully you, but your posts says ... uhm, nothing: you likely get collisions, you shouldn't worry about and so on. Instead of saying those general things, can you, please, say where is the point, when I need to start worrying about? How likely (exactly) hash collisions occurs?

mikera Over a year ago

@om-nom-nom: if you want to know the exact likelihood of collisions, then you need to describe the distribution of Strings. You also need to describe your use case. These are different questions from what was asked. I can tell you that Java's hashCode is optimal, in the sense that it produces the minimal number of collisions possible for a 32-bit hashCode given totally random Strings.

caduceus Over a year ago

@mikera you're right, even Integer.MIN_VALUE stackoverflow.com/questions/74435861/…

Collectives™ on Stack Overflow

String.hashCode in Java [duplicate]

2 Answers 2

Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Linked

Related