9

Looking at Java's String class we can see that hash code is cached after first evaluation.

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

Where hash is an instance variable. I have a question, why do we need that h extra variable?

2
  • 1
    It is written like this to ensure that the String class is thread-safe. You can read more about this concept here Commented Apr 27, 2017 at 9:29
  • 1
    That Wikipedia link doesn't fully explain what's going on here or why. Commented Apr 27, 2017 at 10:41

3 Answers 3

6

Simply because hash value changes in the loop and your solution without intermediate temporary variable is not thread-safe. Consider that this method is invoked in several threads.

Say thread-1 started hash computation and it is not 0 anymore. Some small moment later thread-2 invokes the same method hashCode() on the same object and sees that hash is not 0, but thread-1 hasn't yet finished its computation. As the result, in the thread-2 wrong hash (not fully computed) value will be used.

Sign up to request clarification or add additional context in comments.

Comments

6

It's a simple and cheap synchronization mechanism.

If a thread invokes hashCode() for the first time and a second thread invokes it again while the first thread is calculating the hash, the second thread would return an incorrect hash (an intermediate value of the calculation in the first thread) if using directly the attribute.

2 Comments

Note that the thread safety here does not prevent the hash from being calculated by more than one thread. Because there is no synchronization mechanism, there is no guarantee that a) two threads won't access hash while it's still 0, nor b) that even after one thread does the caching that any other thread will see the result. Why is it thread safe despite maybe being calculated many times? Because the calculation is idempotent; no two threads can calculate different values.
Totally right, Lew. In that case, calculating twice the hash has a minor impact at the beginning compared with the benefit of not needing any synchronization mechanism during the life of the string.
2

To put it very simple: local primitive h is well local; thus thread-safe; as opposed to hash which is shared.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.