64

I have a class that contains the following two properties:

public int Id      { get; private set; }
public T[] Values  { get; private set; }

I have made it IEquatable<T> and overriden the object.Equals like this:

public override bool Equals(object obj)
{
    return Equals(obj as SimpleTableRow<T>);
}

public bool Equals(SimpleTableRow<T> other)
{
    // Check for null
    if(ReferenceEquals(other, null))
        return false;

    // Check for same reference
    if(ReferenceEquals(this, other))
        return true;

    // Check for same Id and same Values
    return Id == other.Id && Values.SequenceEqual(other.Values);
}

When having override object.Equals I must also override GetHashCode of course. But what code should I implement? How do I create a hashcode out of a generic array? And how do I combine it with the Id integer?

public override int GetHashCode()
{
    return // What?
}

9 Answers 9

98

Because of the problems raised in this thread, I'm posting another reply showing what happens if you get it wrong... mainly, that you can't use the array's GetHashCode(); the correct behaviour is that no warnings are printed when you run it... switch the comments to fix it:

using System;
using System.Collections.Generic;
using System.Linq;
static class Program
{
    static void Main()
    {
        // first and second are logically equivalent
        SimpleTableRow<int> first = new SimpleTableRow<int>(1, 2, 3, 4, 5, 6),
            second = new SimpleTableRow<int>(1, 2, 3, 4, 5, 6);

        if (first.Equals(second) && first.GetHashCode() != second.GetHashCode())
        { // proven Equals, but GetHashCode() disagrees
            Console.WriteLine("We have a problem");
        }
        HashSet<SimpleTableRow<int>> set = new HashSet<SimpleTableRow<int>>();
        set.Add(first);
        set.Add(second);
        // which confuses anything that uses hash algorithms
        if (set.Count != 1) Console.WriteLine("Yup, very bad indeed");
    }
}
class SimpleTableRow<T> : IEquatable<SimpleTableRow<T>>
{

    public SimpleTableRow(int id, params T[] values) {
        this.Id = id;
        this.Values = values;
    }
    public int Id { get; private set; }
    public T[] Values { get; private set; }

    public override int GetHashCode() // wrong
    {
        return Id.GetHashCode() ^ Values.GetHashCode();
    }
    /*
    public override int GetHashCode() // right
    {
        int hash = Id;
        if (Values != null)
        {
            hash = (hash * 17) + Values.Length;
            foreach (T t in Values)
            {
                hash *= 17;
                if (t != null) hash = hash + t.GetHashCode();
            }
        }
        return hash;
    }
    */
    public override bool Equals(object obj)
    {
        return Equals(obj as SimpleTableRow<T>);
    }
    public bool Equals(SimpleTableRow<T> other)
    {
        // Check for null
        if (ReferenceEquals(other, null))
            return false;

        // Check for same reference
        if (ReferenceEquals(this, other))
            return true;

        // Check for same Id and same Values
        return Id == other.Id && Values.SequenceEqual(other.Values);
    }
}
Sign up to request clarification or add additional context in comments.

8 Comments

Could you explain the reasoning behind the right version of GetHashCode()?
@Vinko: Can you clarify? Do you mean "why does the hash-code matter?" - or "why that approach?". Given your rep and answer count, I'm assuming the latter; this is simply a way of getting a hash that takes all the values into account the "multiply by a prime and add the next hash" is a very common approach for hashing that avoids collisions (contrast xor; in which case a collection of "all 8s" could easily give a predictable hashcode of 0). Did I miss anything?
See also: stackoverflow.com/questions/263400#263416... different prime number, but same effect.
I started it anew, sorry. stackoverflow.com/questions/2626839/…, anyway, my aim is the equality, I wish I could skip the GetHashCode implementation part. And yes, the initial value is 0. Anyway I use EF so all the objects are initialized with ID as 0 and then the properties are individually set one by one, not by the initializer, that's the reason that if it's hashed when the ID isn't loaded yet it goes nuts, maybe you'd know how to solve it and enjoy both proper hashing along with equality on this mutable object.
Multiplying an int by a prime (that should also be close to a power of 2, e.g. 17 or 31 are good) spreads out the bits if the two input integers might be expected to be close in value (e.g. they are enum members, or IDs in a system with far fewer than 2 billion rows)
|
42

FWIW, it's very dangerous to use the contents of the Values in your hash code. You should only do this if you can guarantee that it will never change. However, since it is exposed, I don't think guaranteeing it is possible. The hashcode of an object should never change. Otherwise, it loses its value as a key in a Hashtable or Dictionary. Consider the hard-to-find bug of using an object as a key in a Hashtable, its hashcode changes because of an outside influence and you can no longer find it in the Hashtable!

4 Comments

This needs more upvote. I always made a wrong assumption between the concept of GetHashCode and a "MD5 hash" of a downloaded file. GetHashCode is not meant to compare the content, but the container. To make sure it points to the same place in the memory. I used GetHashCode to verify if an object changed since the last time it was saved to the database. I kept a cloned list just to compare the objects, but after overriding GetHashCode, everything based on hashtable started to behave weirdly. Now I just moved my override to it's own method and keep a dictionary with the "Content Hash"
@Pluc: "GetHashCode is meant to make sure the container points to the same place in the memory.", well not quite. It is meant to compare content, it's just that it can have false positives due to collisions. Like MD5, but with a greater chance of collisions.
its hashcode changes because of an outside influence and you can no longer find it in the Hashtable! - make sense to me, if the object was changed, it's no longer the same object, hence it should not be in the hashtable, dictionary, hashset or whatever.
Right, and because of this there is now a System.HashCode class available in the framework which allows you to combine hash codes in a safe way: Use the .Add method to add a hashcode (you got from .GetHashCode() of any variable) and use .ToHashCode() to calculate a new hash code based on the additions. Rewrite Marc's answer using those methods and you're again on a safe way. Thank you for mentioning your doubts!
4

Since the hashCode is kinda a key for storing the object (lllike in a hashtable), i would use just Id.GetHashCode()

1 Comment

Well, this is actually better than using Values.GetHashCode( ), as it preserves compatibility with Equals.
2

How about something like:

    public override int GetHashCode()
    {
        int hash = Id;
        if (Values != null)
        {
            hash = (hash * 17) + Values.Length;
            foreach (T t in Values)
            {
                hash *= 17;
                if (t != null) hash = hash + t.GetHashCode();
            }
        }
        return hash;
    }

This should be compatible with SequenceEqual, rather than doing a reference comparison on the array.

5 Comments

It is dangerous to compare the contents of Values because they aren't guaranteed to be the same for the lifetime of the object. Because the array is exposed, any external class can change it, which affects the hashcode!
The point, though, is that it is compatible with the Equals method as posted.
It affects the equality as well. And you can't use the reference to the arary to compute hashcode, because you end up with two equal objects with different hash codes.
@Grzenio - is that aimed at me or Dustin? I don't use the reference for exactly that reason...
Sorry for the confusion, it was a reply to Dustin's comment here and his code at the same time.
2

I just had to add another answer because one of the more obvious (and easiest to implement) solutions were not mentioned - not including the collection in your GetHashCode calculation!

The main thing that seemed to have forgotten here is that the uniqueness from the result of GetHashCode isn't required (or in many cases even possible). Unequal objects don't have to return unequal hash codes, the only requirement is that equal objects return equal hash codes. So by that definition, the following implementation of GetHashCode is correct for all objects (assuming there's a correct Equals implementation):

public override int GetHashCode() 
{ 
    return 42; 
} 

Of course this would yield the worst possible performance in hashtable lookup, O(n) instead of O(1), but it is still functionally correct.

With that in mind, my general recommendation when implementing GetHashCode for an object that happens to have any kind of collection as one or more of its members is to simply ignore them and calculate GetHashCode solely based on the other scalar members. This would work pretty well except if you put into a hash table a huge number of objects where all their scalar members have identical values, resulting in identical hash codes.

Ignoring collection members when calculating the hash code can also yield a performance improvement, despite the decreased distribution of the hash code values. Remember that using a hash code is supposed to improve performance in a hash table by not requiring to call Equals N times, and instead will only require calling GetHashCode once and a quick hash table lookup. If each object has an inner array with 10,000 items which all participate in the calculation of the hash code, any benefits gained by the good distribution would probably be lost. It would be better to have a marginally less distributed hash code if generating it is considerably less costly.

7 Comments

The purpose of the hash code is not just to select a hash bucket, but more generally to quickly weed out things which can be recognized as non-equal. A class should only base its concept of equality on that of an encapsulated sequence if the sequence is immutable. Assuming the sequence is immutable, the class should probably include sequence items in its computed hash code (which it should in turn probably cache). Otherwise if one adds to a dictionary ten objects with 5,000-item arrays that differ in the last element, trying to find an element would result in...
...all 5,000 elements of the new element being compared to all 5,000 elements of each of the ten objects. By contrast, if each item computed and cached a hash value for the array's contents, even if all ten hash values got mapped to the same hash bucket, the most that would happen if all hash values are distinct would be that the hash value of the new object would get compared to the cached hash values of the other ten. If a couple of hash values collide, that would still be no real problem--just one extra bunch of 5,000-element comparisons (rather than ten).
@supercat: You're doing a lot of assumptions here: That the sequence is immutable, that the object caches its own hashcode (I've never seen that), but most importantly that the object's only data upon which to base a hashcode is the sequence (notice that in the original question the object has an Id property which in almost all cases is enough to generate a unique hashcode). Anyway, you're talking about a very particular scenario which I fail to see how it relates to either the general case or the original question.
If the sequence isn't immutable, it shouldn't take part in equals. My assumption that the type was immutable was predicated upon the OP wanting to test sequences for equality. If one is likely to have and compare against each other many object instances which would be identical (according to the definition used by equals) except for some trait, that trait should generally be part of the hash code. Java considers it worthwhile to cache the hash code for its most common "immutable sequence-ish" type (string).
I can't believe I'm reading this. The last GetHashCode() I wrote specifically had to enumerate a collection in the object to work, as did the Equals().
|
1
public override int GetHashCode() {
   return Id.GetHashCode() ^ Values.GetHashCode();  
}

There are several good points in the comments and other answers. The OP should consider whether the Values would be used as part of the "key" if the object were used as a key in a dictionary. If so, then they should be part of the hash code, otherwise, not.

On the other hand, I'm not sure why the GetHashCode method should mirror SequenceEqual. It's meant to compute an index into a hash table, not to be the complete determinant of equality. If there are many hash table collisions using the algorithm above, and if they differ in the sequence of the Values, then an algorithm should be chosen that takes sequence into account. If sequence doesn't really matter, save the time and don't take it into account.

11 Comments

I also don't think arrays have GetHashCode implemented taking into account all elements
That will do a reference comparison on Values, and will not be compatible with SequenceEqual (i.e. for different arrays with the same contents)
Guys, I've already said it before, but be careful using all of the elements of an exposed array. The result of GetHashCode() should be same for the lifetime of the object, otherwise it won't work as a hashtable key. There's no guarantee that this array won't change, so don't use it in GetHashCode!
@Dustin: Good clarification. That's what I meant when I said "if the object is to be used as a key". Such objects may not change in a way that would change their hash code or equality, while they are acting as a key.
msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx "If two objects compare as equal, the GetHashCode method for each object must return the same value." - where "compare as equal" means "Equals()"
|
0

I would do it this way:

long result = Id.GetHashCode();
foreach(T val in Values)
    result ^= val.GetHashCode();
return result;

8 Comments

pretty reasonable - note that xor can lead to a lot of collisions; generally a multiplier/addition is preferred
interesting, many people told me to use xor instead. I should read more about it then.
In response to that; what would the hash of {3,3,3,3} be? and {4,4,4,4}? or {4,0,0,4}? or {1,0,1,0}? You see the issue...
@MarcGravell: Multiplication is poor. Too bad C# doesn't have left or right bit roll.
@Backwards_Dave That would be a regular shift. In a rotation or circular shift, the bits that are shifted out one side are simultaneously shifted back into the other side. If you divide 0xF9 by 2 four consecutive times, you are left with 0x0F. But if you rotate 0xF9 right by 4 positions (assuming 8-bit registers), you are left with 0x9F.
|
0

I know this thread is pretty old, but I wrote this method to allow me to calculate hashcodes of multiple objects. It's been very helpful for this very case. It's not perfect, but it does meet my needs and most likely yours too.

I can't really take any credit for it. I got the concept from some of the .net gethashcode implementations. I'm using 419 (afterall, it's my favorite large prime), but you can choose just about any reasonable prime (not too small . . . not too large).

So, here's how I get my hashcodes:

using System.Collections.Generic;
using System.Linq;

public static class HashCodeCalculator
{
    public static int CalculateHashCode(params object[] args)
    {
        return args.CalculateHashCode();
    }

    public static int CalculateHashCode(this IEnumerable<object> args)
    {
        if (args == null)
            return new object().GetHashCode();

        unchecked
        {
            return args.Aggregate(0, (current, next) => (current*419) ^ (next ?? new object()).GetHashCode());
        }
    }
}

Comments

0

Provided that Id and Values will never change, and Values is not null...

public override int GetHashCode()
{
  return Id ^ Values.GetHashCode();
}

Note that your class is not immutable, since anyone can modify the contents of Values because it is an array. Given that, I wouldn't try to generate a hashcode using its contents.

6 Comments

That will do a reference comparison on Values, and will not be compatible with SequenceEqual (i.e. for different arrays with the same contents)
Right, but because the array is exposed and any external code can change it, it is frankly dangerous to compare the contents.
So I should really just use the HashCode of the Id?
Which means that... IF the result of Equals changes, the result of GetHashCode wouldn't necessarily have to change, but if GetHashCode changes, then Equals would also change?
Not necessarily. The reference to Values shouldn't change (unless you change it in your code) -- so it should be OK to use that. John Saunders has the best answer here.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.