0

Possible Duplicate:
why do String.hashCode() in java is not implemented in a way with less conflicts?

For non-cryptographic hashes, how does Java's String.hashCode() perform?

Mostly I'm concerned about collisions.

Thanks.

3
  • I've rephrased question since Scala uses Java strings. Commented Jan 5, 2013 at 1:40
  • This is probably impossible to answer without you telling us what strings you are interested in. Over 10-char strings, all reasonable 32-bit hashes have the same number of collisions, namely 256^10-256^4. Commented Jan 5, 2013 at 1:43
  • As expected, quite well. You can read the source code. Collisions are rare, even when normal ASCII is more 6 bit per byte, and hashCode uses *31. Commented Jan 5, 2013 at 1:44

2 Answers 2

4

You seem to be misunderstanding what .hashCode() is for with regards to Java, and more specifically, the .equals()/.hashCode() contract specified by java.lang.Object.

The only part of the contract of matter to anyone is that if two objects are equal with regards to .equals(), then they must have the same hash code as returned by .hashCode(). There is no other obligation to that contract.

It is therefore perfectly legal to write a custom .hashCode() implementation like this, even though this is as suboptimal as one can think of:

@Override
public int hashCode()
{
    // Legal, but useless
    return 42;
}

Of course, JDK developers would never be that thick, and .hashCode() implementations for builtin types (including String) are good enough that you do not even need to worry about collisions. Even then, this implementation will more than likely vary from one JDK implementation to another, and so will its "cryptographic value".

But that's not the point.

The most important thing to consider is that .hashCode() has nothing to do with cryptography at all. Its only obligation is to obey the contract defined by java.lang.Object.

Sign up to request clarification or add additional context in comments.

Comments

0

It's pretty good as a general purpose hash function. i.e. you shouldn't usually worry about it.

In particular:

  • It is fast, to the extent that it probably produces hashes as the CPU can read the String from memory (i.e. you usually can't get better without skipping large parts of the String). It does just one multiply and one add per character in the String.
  • For typical sets of random Strings, it produces well-distributed hashes over the entire int range.

Obviously, it is not a cryptographic hash function, so don't use it for that. Also, be aware that you likely will get hash collisions as it is producing a 32-bit hash. So you just need to design your algorithms to take that into account.

3 Comments

Don't think I'm going to bully you, but your posts says ... uhm, nothing: you likely get collisions, you shouldn't worry about and so on. Instead of saying those general things, can you, please, say where is the point, when I need to start worrying about? How likely (exactly) hash collisions occurs?
@om-nom-nom: if you want to know the exact likelihood of collisions, then you need to describe the distribution of Strings. You also need to describe your use case. These are different questions from what was asked. I can tell you that Java's hashCode is optimal, in the sense that it produces the minimal number of collisions possible for a 32-bit hashCode given totally random Strings.
@mikera you're right, even Integer.MIN_VALUE stackoverflow.com/questions/74435861/…

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.