16

I'm reading Effective C# and there is a comment about Object.GetHashCode() that I didn't understand:

Object.GetHashCode() uses an internal field in the System.Object class to generate the hash value. Each object created is assigned a unique object key, stored as an integer, when it is created.
These keys start at 1 and increment every time a new object of any type gets created. The object identity field is set in the System.Object constructor and cannot be modified later. Object.GetHashCode() returns this value as the hash code for a given object.

I tried to look at the documentation of Object.GetHashCode() and didn't find any information about this.

I wrote the simple piece of code to print the hash code of newly generated objects:

using System;

namespace TestGetHashCode
{
    class Program
    {
        static void Main(string[] args)
        {
            for (int i = 0; i < 100; i++)
            {
                object o = new object();
                Console.WriteLine(o.GetHashCode());
            }
        }
    }
}

The first few numbers that were printed were:

37121646,
45592480,
57352375,
2637164,
41014879,
3888474,
25209742,
26966483,
31884011

Which didn't seem to fit that

These keys start at 1 and increment every time a new object of any type gets created...Object.GetHashCode() returns this value

Then, in order to find this "internal field in the System.Object" I tried using ReSharper decompiled sources but the code I found was

[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
[__DynamicallyInvokable]
public virtual int GetHashCode()
{
  return RuntimeHelpers.GetHashCode(this);
}

and again using decompiled sources I found that RuntimeHelpers.GetHashCode was implemented as

[SecuritySafeCritical]
[__DynamicallyInvokable]
[MethodImpl(MethodImplOptions.InternalCall)]
public static int GetHashCode(object o);

following the MethodImpl attribute it seems that I can't view the implementation and this is a dead end for me.

Can someone please explain the comment by the author (the first quote) ?

What is the internal field within the Object class and how it is used for the implementation of the Object.GetHashCode()?

11
  • This can be usefull: stackoverflow.com/questions/720177/… Commented Nov 28, 2014 at 21:08
  • @Yuval Itzchakov: is it irony? Otherwise, why do you think so? Commented Nov 28, 2014 at 21:20
  • 1
    @Yuval Itzchakov: why do you think GetHashCode() has to return a unique identifier? It's called Get*Hash*Code() not Get*UniqueIdentifier*() with purpose Commented Nov 28, 2014 at 21:32
  • 1
    @Yuval Itzchakov: "will generate equality for all returned objects" --- that's incorrect. You should never treat objects equal if their hash codes match. It actually is the opposite: the equal objects MUST return equal hash codes, but equal hash codes does not mean the objects are equal. msdn.microsoft.com/en-us/library/… "You should not assume that equal hash codes imply object equality." Commented Nov 28, 2014 at 21:48
  • 1
    @Yuval Itzchakov: "who look solely at hashcode equality" --- no one should do that. If someone does something stupid - it's not the reason to encourage them for continuing doing that. It's not the case for .net. If it's the case for a 3rd party library - it must be reported as a bug and fixed. Commented Nov 28, 2014 at 21:52

1 Answer 1

22

Okay, I'd better write this up. The book is very inaccurate. The value for Object.GetHashCode() is generated inside the CLR and is calculated on demand, whenever GetHashCode() is called the first time. I'll quote the code from the SSCLI20 distribution, clr/src/vm/thread.h has the function that produces the number, it looks like this (edited for readability):

inline DWORD GetNewHashCode()
{
    // Every thread has its own generator for hash codes so that we won't get into a 
    // situation where two threads consistently give out the same hash codes.
    // Choice of multiplier guarantees period of 2**32
    // see Knuth Vol 2 p16 (3.2.1.2 Theorem A).
    DWORD multiplier = m_ThreadId*4 + 5;
    m_dwHashCodeSeed = m_dwHashCodeSeed*multiplier + 1;
    return m_dwHashCodeSeed;
}

After which it is stored in the so-called sync block of the object so subsequent calls return the same value. Only 26 of the generated 32 bits are actually stored, the sync block needs space for some status bits. Still plenty good enough to generate a very high quality hash code, collisions are quite rare.

The presence of the m_ThreadId variable in that code can use an explanation. The random number generator seed is stored for each individual thread. A trick to avoid having to take a lock.

The m_dwHashCodeSeed is initialized in the Thread constructor like this:

   // Initialize this variable to a very different start value for each thread
   // Using linear congruential generator from Knuth Vol. 2, p. 102, line 24
   dwHashCodeSeed = dwHashCodeSeed * 1566083941 + 1;
   m_dwHashCodeSeed = dwHashCodeSeed;

with:

   static  DWORD dwHashCodeSeed = 123456789;
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for the answer. From this code it seems that GetNewHashCode is a function that depends only on m_ThreadId so for the same thread it generates the same hash code in every call. What am I missing here ?
No, it depends on m_dwHashCodeSeed, updated each time a new hash code is generated. Every thread starts out with a different seed, bottom snippet. m_ThreadId just adds an extra level of randomness, Donald Knuth helped with that. Don't focus too much on threads being involved with this, it is just a lock-less code trick. Locks are expensive. There is some trickery involved with two threads calling GetHashCode at the same time, I left that out to keep it semi-comprehensible.
I understand how this works and thank you for the write up, but why in the world does it work like this? "Hoping" for no conflicts like they're doing is not needed right? Why not generate a hash for the object based on that object's unique identifier? Or, if none exist, use a UID that increments by 1 every time and store that in the SYNC block? It's the same as the above solution with just the absolute minimum rate of conflicts possible (conflicts only when the UID overflows)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.