GetHashCode override of object containing generic array

Question

I have a class that contains the following two properties:

public int Id      { get; private set; }
public T[] Values  { get; private set; }

I have made it IEquatable<T> and overriden the object.Equals like this:

public override bool Equals(object obj)
{
    return Equals(obj as SimpleTableRow<T>);
}

public bool Equals(SimpleTableRow<T> other)
{
    // Check for null
    if(ReferenceEquals(other, null))
        return false;

    // Check for same reference
    if(ReferenceEquals(this, other))
        return true;

    // Check for same Id and same Values
    return Id == other.Id && Values.SequenceEqual(other.Values);
}

When having override object.Equals I must also override GetHashCode of course. But what code should I implement? How do I create a hashcode out of a generic array? And how do I combine it with the Id integer?

public override int GetHashCode()
{
    return // What?
}

Marc Gravell · Accepted Answer · 2009-03-12 15:16:08Z

98

Because of the problems raised in this thread, I'm posting another reply showing what happens if you get it wrong... mainly, that you can't use the array's GetHashCode(); the correct behaviour is that no warnings are printed when you run it... switch the comments to fix it:

using System;
using System.Collections.Generic;
using System.Linq;
static class Program
{
    static void Main()
    {
        // first and second are logically equivalent
        SimpleTableRow<int> first = new SimpleTableRow<int>(1, 2, 3, 4, 5, 6),
            second = new SimpleTableRow<int>(1, 2, 3, 4, 5, 6);

        if (first.Equals(second) && first.GetHashCode() != second.GetHashCode())
        { // proven Equals, but GetHashCode() disagrees
            Console.WriteLine("We have a problem");
        }
        HashSet<SimpleTableRow<int>> set = new HashSet<SimpleTableRow<int>>();
        set.Add(first);
        set.Add(second);
        // which confuses anything that uses hash algorithms
        if (set.Count != 1) Console.WriteLine("Yup, very bad indeed");
    }
}
class SimpleTableRow<T> : IEquatable<SimpleTableRow<T>>
{

    public SimpleTableRow(int id, params T[] values) {
        this.Id = id;
        this.Values = values;
    }
    public int Id { get; private set; }
    public T[] Values { get; private set; }

    public override int GetHashCode() // wrong
    {
        return Id.GetHashCode() ^ Values.GetHashCode();
    }
    /*
    public override int GetHashCode() // right
    {
        int hash = Id;
        if (Values != null)
        {
            hash = (hash * 17) + Values.Length;
            foreach (T t in Values)
            {
                hash *= 17;
                if (t != null) hash = hash + t.GetHashCode();
            }
        }
        return hash;
    }
    */
    public override bool Equals(object obj)
    {
        return Equals(obj as SimpleTableRow<T>);
    }
    public bool Equals(SimpleTableRow<T> other)
    {
        // Check for null
        if (ReferenceEquals(other, null))
            return false;

        // Check for same reference
        if (ReferenceEquals(this, other))
            return true;

        // Check for same Id and same Values
        return Id == other.Id && Values.SequenceEqual(other.Values);
    }
}

answered Mar 12, 2009 at 15:16

Marc Gravell

1.1m273 gold badges2.6k silver badges3k bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Vinko Vrsalovic Over a year ago

Could you explain the reasoning behind the right version of GetHashCode()?

Marc Gravell Over a year ago

@Vinko: Can you clarify? Do you mean "why does the hash-code matter?" - or "why that approach?". Given your rep and answer count, I'm assuming the latter; this is simply a way of getting a hash that takes all the values into account the "multiply by a prime and add the next hash" is a very common approach for hashing that avoids collisions (contrast xor; in which case a collection of "all 8s" could easily give a predictable hashcode of 0). Did I miss anything?

Marc Gravell Over a year ago

See also: stackoverflow.com/questions/263400#263416... different prime number, but same effect.

Shimmy Weitzhandler Over a year ago

I started it anew, sorry. stackoverflow.com/questions/2626839/…, anyway, my aim is the equality, I wish I could skip the GetHashCode implementation part. And yes, the initial value is 0. Anyway I use EF so all the objects are initialized with ID as 0 and then the properties are individually set one by one, not by the initializer, that's the reason that if it's hashed when the ID isn't loaded yet it goes nuts, maybe you'd know how to solve it and enjoy both proper hashing along with equality on this mutable object.

Eric J. Over a year ago

Multiplying an int by a prime (that should also be close to a power of 2, e.g. 17 or 31 are good) spreads out the bits if the two input integers might be expected to be close in value (e.g. they are enum members, or IDs in a system with far fewer than 2 billion rows)

|

Chin · Accepted Answer · 2015-03-25 20:26:02Z

42

FWIW, it's very dangerous to use the contents of the Values in your hash code. You should only do this if you can guarantee that it will never change. However, since it is exposed, I don't think guaranteeing it is possible. The hashcode of an object should never change. Otherwise, it loses its value as a key in a Hashtable or Dictionary. Consider the hard-to-find bug of using an object as a key in a Hashtable, its hashcode changes because of an outside influence and you can no longer find it in the Hashtable!

edited Mar 25, 2015 at 20:26

Chin

20.9k38 gold badges113 silver badges173 bronze badges

answered Mar 12, 2009 at 14:17

Dustin Campbell

9,8932 gold badges33 silver badges33 bronze badges

4 Comments

Pluc Over a year ago

This needs more upvote. I always made a wrong assumption between the concept of GetHashCode and a "MD5 hash" of a downloaded file. GetHashCode is not meant to compare the content, but the container. To make sure it points to the same place in the memory. I used GetHashCode to verify if an object changed since the last time it was saved to the database. I kept a cloned list just to compare the objects, but after overriding GetHashCode, everything based on hashtable started to behave weirdly. Now I just moved my override to it's own method and keep a dictionary with the "Content Hash"

vgru Over a year ago

@Pluc: "GetHashCode is meant to make sure the container points to the same place in the memory.", well not quite. It is meant to compare content, it's just that it can have false positives due to collisions. Like MD5, but with a greater chance of collisions.

Mykhailo Seniutovych Over a year ago

its hashcode changes because of an outside influence and you can no longer find it in the Hashtable! - make sense to me, if the object was changed, it's no longer the same object, hence it should not be in the hashtable, dictionary, hashset or whatever.

Matt Over a year ago

Right, and because of this there is now a System.HashCode class available in the framework which allows you to combine hash codes in a safe way: Use the .Add method to add a hashcode (you got from .GetHashCode() of any variable) and use .ToHashCode() to calculate a new hash code based on the additions. Rewrite Marc's answer using those methods and you're again on a safe way. Thank you for mentioning your doubts!

Jhonny D. Cano -Leftware- · Accepted Answer · 2009-03-12 14:19:25Z

4

Since the hashCode is kinda a key for storing the object (lllike in a hashtable), i would use just Id.GetHashCode()

answered Mar 12, 2009 at 14:19

Jhonny D. Cano -Leftware-

18.1k15 gold badges84 silver badges103 bronze badges

1 Comment

Thomas Dufour Over a year ago

Well, this is actually better than using Values.GetHashCode( ), as it preserves compatibility with Equals.

Marc Gravell · Accepted Answer · 2009-03-12 14:14:13Z

2

How about something like:

    public override int GetHashCode()
    {
        int hash = Id;
        if (Values != null)
        {
            hash = (hash * 17) + Values.Length;
            foreach (T t in Values)
            {
                hash *= 17;
                if (t != null) hash = hash + t.GetHashCode();
            }
        }
        return hash;
    }

This should be compatible with SequenceEqual, rather than doing a reference comparison on the array.

answered Mar 12, 2009 at 14:14

Marc Gravell

1.1m273 gold badges2.6k silver badges3k bronze badges

5 Comments

Dustin Campbell Over a year ago

It is dangerous to compare the contents of Values because they aren't guaranteed to be the same for the lifetime of the object. Because the array is exposed, any external class can change it, which affects the hashcode!

Marc Gravell Over a year ago

The point, though, is that it is compatible with the Equals method as posted.

Grzenio Over a year ago

It affects the equality as well. And you can't use the reference to the arary to compute hashcode, because you end up with two equal objects with different hash codes.

Marc Gravell Over a year ago

@Grzenio - is that aimed at me or Dustin? I don't use the reference for exactly that reason...

Grzenio Over a year ago

Sorry for the confusion, it was a reply to Dustin's comment here and his code at the same time.

Allon Guralnek · Accepted Answer · 2012-08-21 16:26:14Z

2

I just had to add another answer because one of the more obvious (and easiest to implement) solutions were not mentioned - not including the collection in your GetHashCode calculation!

The main thing that seemed to have forgotten here is that the uniqueness from the result of GetHashCode isn't required (or in many cases even possible). Unequal objects don't have to return unequal hash codes, the only requirement is that equal objects return equal hash codes. So by that definition, the following implementation of GetHashCode is correct for all objects (assuming there's a correct Equals implementation):

public override int GetHashCode() 
{ 
    return 42; 
}

Of course this would yield the worst possible performance in hashtable lookup, O(n) instead of O(1), but it is still functionally correct.

With that in mind, my general recommendation when implementing GetHashCode for an object that happens to have any kind of collection as one or more of its members is to simply ignore them and calculate GetHashCode solely based on the other scalar members. This would work pretty well except if you put into a hash table a huge number of objects where all their scalar members have identical values, resulting in identical hash codes.

Ignoring collection members when calculating the hash code can also yield a performance improvement, despite the decreased distribution of the hash code values. Remember that using a hash code is supposed to improve performance in a hash table by not requiring to call Equals N times, and instead will only require calling GetHashCode once and a quick hash table lookup. If each object has an inner array with 10,000 items which all participate in the calculation of the hash code, any benefits gained by the good distribution would probably be lost. It would be better to have a marginally less distributed hash code if generating it is considerably less costly.

answered Aug 21, 2012 at 16:26

Allon Guralnek

16.2k6 gold badges65 silver badges96 bronze badges

7 Comments

supercat Over a year ago

The purpose of the hash code is not just to select a hash bucket, but more generally to quickly weed out things which can be recognized as non-equal. A class should only base its concept of equality on that of an encapsulated sequence if the sequence is immutable. Assuming the sequence is immutable, the class should probably include sequence items in its computed hash code (which it should in turn probably cache). Otherwise if one adds to a dictionary ten objects with 5,000-item arrays that differ in the last element, trying to find an element would result in...

supercat Over a year ago

...all 5,000 elements of the new element being compared to all 5,000 elements of each of the ten objects. By contrast, if each item computed and cached a hash value for the array's contents, even if all ten hash values got mapped to the same hash bucket, the most that would happen if all hash values are distinct would be that the hash value of the new object would get compared to the cached hash values of the other ten. If a couple of hash values collide, that would still be no real problem--just one extra bunch of 5,000-element comparisons (rather than ten).

Allon Guralnek Over a year ago

@supercat: You're doing a lot of assumptions here: That the sequence is immutable, that the object caches its own hashcode (I've never seen that), but most importantly that the object's only data upon which to base a hashcode is the sequence (notice that in the original question the object has an Id property which in almost all cases is enough to generate a unique hashcode). Anyway, you're talking about a very particular scenario which I fail to see how it relates to either the general case or the original question.

supercat Over a year ago

If the sequence isn't immutable, it shouldn't take part in equals. My assumption that the type was immutable was predicated upon the OP wanting to test sequences for equality. If one is likely to have and compare against each other many object instances which would be identical (according to the definition used by equals) except for some trait, that trait should generally be part of the hash code. Java considers it worthwhile to cache the hash code for its most common "immutable sequence-ish" type (string).

Joshua Over a year ago

I can't believe I'm reading this. The last GetHashCode() I wrote specifically had to enumerate a collection in the object to work, as did the Equals().

|

Hakan Fıstık · Accepted Answer · 2017-03-03 13:31:53Z

1

public override int GetHashCode() {
   return Id.GetHashCode() ^ Values.GetHashCode();  
}

There are several good points in the comments and other answers. The OP should consider whether the Values would be used as part of the "key" if the object were used as a key in a dictionary. If so, then they should be part of the hash code, otherwise, not.

On the other hand, I'm not sure why the GetHashCode method should mirror SequenceEqual. It's meant to compute an index into a hash table, not to be the complete determinant of equality. If there are many hash table collisions using the algorithm above, and if they differ in the sequence of the Values, then an algorithm should be chosen that takes sequence into account. If sequence doesn't really matter, save the time and don't take it into account.

edited Mar 3, 2017 at 13:31

Hakan Fıstık

19.9k16 gold badges124 silver badges152 bronze badges

answered Mar 12, 2009 at 14:12

John Saunders

162k26 gold badges252 silver badges403 bronze badges

11 Comments

Grzenio Over a year ago

I also don't think arrays have GetHashCode implemented taking into account all elements

Marc Gravell Over a year ago

That will do a reference comparison on Values, and will not be compatible with SequenceEqual (i.e. for different arrays with the same contents)

Dustin Campbell Over a year ago

Guys, I've already said it before, but be careful using all of the elements of an exposed array. The result of GetHashCode() should be same for the lifetime of the object, otherwise it won't work as a hashtable key. There's no guarantee that this array won't change, so don't use it in GetHashCode!

John Saunders Over a year ago

@Dustin: Good clarification. That's what I meant when I said "if the object is to be used as a key". Such objects may not change in a way that would change their hash code or equality, while they are acting as a key.

Marc Gravell Over a year ago

msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx "If two objects compare as equal, the GetHashCode method for each object must return the same value." - where "compare as equal" means "Equals()"

|

Grzenio · Accepted Answer · 2009-03-12 14:13:23Z

0

I would do it this way:

long result = Id.GetHashCode();
foreach(T val in Values)
    result ^= val.GetHashCode();
return result;

answered Mar 12, 2009 at 14:13

Grzenio

36.8k49 gold badges163 silver badges241 bronze badges

8 Comments

Marc Gravell Over a year ago

pretty reasonable - note that xor can lead to a lot of collisions; generally a multiplier/addition is preferred

Grzenio Over a year ago

interesting, many people told me to use xor instead. I should read more about it then.

Marc Gravell Over a year ago

In response to that; what would the hash of {3,3,3,3} be? and {4,4,4,4}? or {4,0,0,4}? or {1,0,1,0}? You see the issue...

Joshua Over a year ago

@MarcGravell: Multiplication is poor. Too bad C# doesn't have left or right bit roll.

Carvo Loco Over a year ago

@Backwards_Dave That would be a regular shift. In a rotation or circular shift, the bits that are shifted out one side are simultaneously shifted back into the other side. If you divide 0xF9 by 2 four consecutive times, you are left with 0x0F. But if you rotate 0xF9 right by 4 positions (assuming 8-bit registers), you are left with 0x9F.

|

D. Patrick · Accepted Answer · 2010-12-18 19:19:16Z

I know this thread is pretty old, but I wrote this method to allow me to calculate hashcodes of multiple objects. It's been very helpful for this very case. It's not perfect, but it does meet my needs and most likely yours too.

I can't really take any credit for it. I got the concept from some of the .net gethashcode implementations. I'm using 419 (afterall, it's my favorite large prime), but you can choose just about any reasonable prime (not too small . . . not too large).

So, here's how I get my hashcodes:

using System.Collections.Generic;
using System.Linq;

public static class HashCodeCalculator
{
    public static int CalculateHashCode(params object[] args)
    {
        return args.CalculateHashCode();
    }

    public static int CalculateHashCode(this IEnumerable<object> args)
    {
        if (args == null)
            return new object().GetHashCode();

        unchecked
        {
            return args.Aggregate(0, (current, next) => (current*419) ^ (next ?? new object()).GetHashCode());
        }
    }
}

Jason Plank · Accepted Answer · 2011-11-17 15:09:49Z

0

Provided that Id and Values will never change, and Values is not null...

public override int GetHashCode()
{
  return Id ^ Values.GetHashCode();
}

Note that your class is not immutable, since anyone can modify the contents of Values because it is an array. Given that, I wouldn't try to generate a hashcode using its contents.

edited Nov 17, 2011 at 15:09

Jason Plank

2,3325 gold badges32 silver badges40 bronze badges

answered Mar 12, 2009 at 14:14

Dustin Campbell

9,8932 gold badges33 silver badges33 bronze badges

6 Comments

Marc Gravell Over a year ago

That will do a reference comparison on Values, and will not be compatible with SequenceEqual (i.e. for different arrays with the same contents)

Dustin Campbell Over a year ago

Right, but because the array is exposed and any external code can change it, it is frankly dangerous to compare the contents.

Svish Over a year ago

So I should really just use the HashCode of the Id?

Svish Over a year ago

Which means that... IF the result of Equals changes, the result of GetHashCode wouldn't necessarily have to change, but if GetHashCode changes, then Equals would also change?

Dustin Campbell Over a year ago

Not necessarily. The reference to Values shouldn't change (unless you change it in your code) -- so it should be OK to use that. John Saunders has the best answer here.

|

Collectives™ on Stack Overflow

GetHashCode override of object containing generic array

9 Answers 9

8 Comments

4 Comments

1 Comment

5 Comments

7 Comments

11 Comments

8 Comments

Comments

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

8 Comments

4 Comments

1 Comment

5 Comments

7 Comments

11 Comments

8 Comments

Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related