Is Nullable<T>.GetHashCode() a poor hash code function?

Question

The implementation of Nullable<T>.GetHashCode() is as follows:

public override int GetHashCode()
{
    if (!this.HasValue)
    {
        return 0;
    }
    return this.value.GetHashCode();
}

If however the underlying value also generates a hash code of 0 (e.g. a bool set to false or an int32 set to 0), then we have two commonly occurring different object states with the same hash code. It seems to me that a better implementation would have been something like.

public override int GetHashCode()
{
    if (!this.HasValue)
    {
        return 0xD523648A; // E.g. some arbitrary 32 bit int with a good mix of set and 
                           // unset bits (also probably a prime number).
    }
    return this.value.GetHashCode();
}

What makes you think that the underlying GetHashCode method is more likely to produce 0 than 0x12345678? — stakx - no longer contributing
– stakx - no longer contributing, Commented Nov 23, 2012 at 11:24
@stakx because, for instance and just for instance, int.GetHashCode(x) = x. 0 is much more likely than a random constant for any given integer in real applications. — usr
– usr, Commented Nov 23, 2012 at 11:27
Is this question resulting from any issue surrounding this or is this purely theoretical? Any situation where this has even the slightest implications on actual software performance wise you should be looking at not using Nullable<T> isntead. — MrDosu
– MrDosu, Commented Nov 23, 2012 at 11:29
@slawekwin: Yes, I happened to encounter 0xD523648A already twice to day ;) — Willem van Rumpt
– Willem van Rumpt, Commented Nov 23, 2012 at 11:40

Hans Passant · Accepted Answer · 2012-11-23 12:19:39Z

4

Yes, you do have a point. It is always possible to write a better GetHashCode() implementation if you know up front what data you are going to store. Not a luxury that a library writer ever has available. But yes, if you have a lot of bool? that are either false or !HasValue then the default implementation is going to hurt. Same for enums and ints, zero is a common value.

Your argument is academic however, changing the implementation costs minus ten thousand points and you can't do it yourself. Best you can do is submit the suggestion, the proper channel is the user-voice site. Getting traction on this is going to be difficult, good luck.

answered Nov 23, 2012 at 12:19

Hans Passant

946k151 gold badges1.8k silver badges2.6k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Willem van Rumpt Over a year ago

That sounds to me like : If you know your domain, you most likely can do better. If you want nullable structs and know you can do better Nullable<T>, build it. You won't have compiler support for it, but hey, you're in a pretty specific situation already.

Hans Passant Over a year ago

Nullable is a tough case, it is directly supported by the CLR to handle its unusual boxing behavior. Not that easy to replace.

Willem van Rumpt Over a year ago

Ah, interesting, and didn't know that, food for investigation :) It still leaves open the possibility of rolling your own though. The performance issue was GetHashCode(), not boxing / unboxing.

usr · Accepted Answer · 2012-11-23 11:45:32Z

2

Let's first note that this question is just about performance. The hash code is not required to be unique or collision resistant for correctness. It is helpful for performance though.

Actually, this is the main value proposition of a hash table: Practically evenly distributed hash codes lead to O(1) behavior.

So what hash code constant is most likely to lead to the best possible performance profile in real applications?

Certainly not 0 because 0 is a common hash code: 0.GetHashCode() == 0. That goes for other types as well. 0 is the worst candidate because it tends to occur so often.

So how to avoid collisions? My proposal:

static readonly int nullableDefaultHashCode = GetRandomInt32();
public override int GetHashCode()
{
    if (!this.HasValue)
        return nullableDefaultHashCode;
    else
        return this.value.GetHashCode();
}

Evenly distributed, unlikely to collide and no stylistic problem of choosing an arbitrary constant.

Note, that GetRandomInt32 could be implemented as return 0xD523648A;. It would still be more useful than return 0;. But it is probably best to query a cheap source of pseudo-random numbers.

answered Nov 23, 2012 at 11:45

usr

172k38 gold badges251 silver badges380 bronze badges

7 Comments

Willem van Rumpt Over a year ago

This just muddies the waters. If performance of GetHashCode() is your issue, and you generate a default hash upon type initialization, then good luck between determining the performance problems between runs of your application. One run it's there, the next, poof, gone.

usr Over a year ago

That's a good point. So set in a constant, then. My main points still apply.

Willem van Rumpt Over a year ago

But when you set it in a constant, you're trying to solve a performance problem for every possible struct that any sentient being programming in C# might come up with again.

redcalx Over a year ago

@Willem: Zero is a constant.

usr Over a year ago

My point is that non-zero numbers are likely, not guaranteed, to perform better. You are saying that there are cases where they don't. But those cases are clearly not as common. Why choose the worst possible number? Why choose the number that is maximally likely to collide?

|

Willem van Rumpt · Accepted Answer · 2012-11-23 11:40:47Z

1

In the end, a Nullable<T> without value has to return a hashcode, and that hashcode should be a constant.

Returning an arbitrary constant may look more safe or appropriate, perhaps even more so when viewed within the specific case of Nullable<int>, but in the end it's just that: a hash.

And within the entire set that Nullable<T> can cover (which is infinite), zero is not a better hashcode than any other value.

answered Nov 23, 2012 at 11:40

Willem van Rumpt

6,5602 gold badges34 silver badges44 bronze badges

4 Comments

usr Over a year ago

You are excluding performance concerns from this answer. This question is just about performance.

Willem van Rumpt Over a year ago

@usr: And how would you implement generic performance for Nullable<T> over ALL possible structs in the entire universe?

usr Over a year ago

It is not possible but we can try to come close. Also see my answer which goes further.

Willem van Rumpt Over a year ago

Any explanation about the downvotes? Not worried about them, I'd just like to hear the arguments, it's an interesting discussion :)

to StackOverflow · Accepted Answer · 2012-11-24 09:23:50Z

0

I don't understand the concern here - poor performance in what situation?

Why would you could consider a hash function as poor based on its result for one value.

I could see that it would be a problem if many different values of a Type hash to the same result. But the fact that null hashes to the same value as 0 seems insignificant.

As far as I know the most common use of a .NET hash function is for a Hashtable, HashSet or Dictionary key, and the fact that zero and null happen to be in the same bucket will have an insignificant effect on overall performance.

edited Nov 24, 2012 at 9:23

answered Nov 23, 2012 at 12:17

to StackOverflow

125k34 gold badges212 silver badges345 bronze badges

Collectives™ on Stack Overflow

Is Nullable<T>.GetHashCode() a poor hash code function?

4 Answers 4

3 Comments

7 Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

7 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related