I have a class that internally is just an array of integers. Once constructed the array never changes. I'd like to pre-compute a good hashcode so that this class can be very efficiently used as a key in a Dictionary. The length of the array is less than about 30 items, and the integers are between -1000 and 1000 in general.
-
1Dictionary key is unique and if your object store array of values, and key is computed based on them then there is no guarantee that you can get a unique hash key for the dictionaryFadrian Sudaman– Fadrian Sudaman2010-08-04 10:59:55 +00:00Commented Aug 4, 2010 at 10:59
-
2@Fadrian: The OP does not want to compute a key but a HashValue. Look up what that means. Hashvalues are pseudo-unique.Henk Holterman– Henk Holterman2010-08-04 11:57:19 +00:00Commented Aug 4, 2010 at 11:57
-
Thanks Henk. I know how hash value are suppose to work and I may have misread the intend of the question when I posted the comment and its great that you pointed that out.Fadrian Sudaman– Fadrian Sudaman2010-08-04 12:04:35 +00:00Commented Aug 4, 2010 at 12:04
-
Fadrian, Henk was right. My intent was not to get a unique code but to get something pretty close to that that is quickly computable so that I dont need to do a full Equals very often. I realise that if you know the data you expect fairly well that it is possible to make a more appropriate choice which is what my question is seeking. A lot of the answers below are quite mathematical and I will need time to understand them.freddy smith– freddy smith2010-08-04 13:01:24 +00:00Commented Aug 4, 2010 at 13:01
10 Answers
Not very clever, but sufficient for most practical purposes:
EDIT: changed due to comment of Henk Holterman, thanks for that.
int hc = array.Length;
foreach (int val in array)
{
hc = unchecked(hc * 314159 + val);
}
If you need something more sophisticated, look here.
13 Comments
hc=unchecked(hc*SHIFTVAL+array[i]); to be independent of compiler settings.Equals method take action. As far as I remember, the dicionary uses the Equals method to differ objects with the same hash (that's why the Equals method shouldn't realy on the hash itself).For an array of values generally between -1000 and 1000, I would probably use something like this:
static int GetHashCode(int[] values)
{
int result = 0;
int shift = 0;
for (int i = 0; i < values.Length; i++)
{
shift = (shift + 11) % 21;
result ^= (values[i]+1024) << shift;
}
return result;
}
1 Comment
You may use CRC32 checksum. Here is the code:
[CLSCompliant(false)]
public class Crc32 {
uint[] table = new uint[256];
uint[] Table { get { return table; } }
public Crc32() {
MakeCrcTable();
}
void MakeCrcTable() {
for (uint n = 0; n < 256; n++) {
uint value = n;
for (int i = 0; i < 8; i++) {
if ((value & 1) != 0)
value = 0xedb88320 ^ (value >> 1);
else
value = value >> 1;
}
Table[n] = value;
}
}
public uint UpdateCrc(uint crc, byte[] buffer, int length) {
uint result = crc;
for (int n = 0; n < length; n++) {
result = Table[(result ^ buffer[n]) & 0xff] ^ (result >> 8);
}
return result;
}
public uint Calculate(Stream stream) {
long pos = stream.Position;
const int size = 0x32000;
byte[] buf = new byte[size];
int bytes = 0;
uint result = 0xffffffff;
do {
bytes = stream.Read(buf, 0, size);
result = UpdateCrc(result, buf, bytes);
}
while (bytes == size);
stream.Position = pos;
return ~result;
}
}
3 Comments
I'm using this here
var arrayHash = string.Join(string.Empty, array).GetHashCode();
If a element changed in the array, you will get a new hash.
1 Comment
You could take a different approach and use a recursive dictionary for each value in your int array. This way you can leave .net to do primitive type hashing.
internal class DictionaryEntry<TKey, TValue>
{
public Dictionary<TKey, DictionaryEntry<TKey, TValue>> Children { get; private set; }
public TValue Value { get; private set; }
public bool HasValue { get; private set; }
public void SetValue(TValue value)
{
Value = value;
HasValue = true;
}
public DictionaryEntry()
{
Children = new Dictionary<TKey, DictionaryEntry<TKey, TValue>>();
}
}
internal class KeyStackDictionary<TKey, TValue>
{
// Helper dictionary to work with a stack of keys
// Usage:
// var dict = new KeyStackDictionary<int, string>();
// int[] keyStack = new int[] {23, 43, 54};
// dict.SetValue(keyStack, "foo");
// string value;
// if (dict.GetValue(keyStack, out value))
// {
// }
private DictionaryEntry<TKey, TValue> _dict;
public KeyStackDictionary()
{
_dict = new DictionaryEntry<TKey, TValue>();
}
public void SetValue(TKey[] keyStack, TValue value)
{
DictionaryEntry<TKey, TValue> dict = _dict;
for (int i = 0; i < keyStack.Length; i++)
{
TKey key = keyStack[i];
if (dict.Children.ContainsKey(key))
{
dict = dict.Children[key];
}
else
{
var child = new DictionaryEntry<TKey, TValue>();
dict.Children.Add(key, child);
dict = child;
}
if (i == keyStack.Length - 1)
{
dict.SetValue(value);
}
}
}
// returns false if the value is not found using the key stack
public bool GetValue(TKey[] keyStack, out TValue value)
{
DictionaryEntry<TKey, TValue> dict = _dict;
for (int i = 0; i < keyStack.Length; i++)
{
TKey key = keyStack[i];
if (dict.Children.ContainsKey(key))
{
dict = dict.Children[key];
}
else
{
break;
}
if (i == keyStack.Length - 1 && dict.HasValue)
{
value = dict.Value;
return true;
}
}
value = default(TValue);
return false;
}
}
Comments
I think choosing a good hash-algorithm would have to be based on the distribution (in a probability sense) of the integer values.
Have a look at Wikipedia for a list of algorithms
Comments
Any CRC (or even XOR) should be ok.
3 Comments
Method via string:
Guid hash = string.Join(':', array.Order()).MD5Hash();
An array should be ordered. Use any separator for recognizing different arrays, e.g. [1, 11] and [111] after join. GetHashCode() for string can give the same code for absolutely different strings (https://learn.microsoft.com/en-us/dotnet/api/system.string.gethashcode?view=net-9.0), so use MD5 algorithm to get hash code of string as Guid. Sure consider the performance MD5 in case thousands ops per sec., but for most regular tasks it's pretty applicable.
public static Guid MD5Hash(this string text)
{
var hashed = MD5.HashData(Encoding.Unicode.GetBytes(text));
return Guid.Parse(BitConverter.ToString(hashed, 0).Replace("-", string.Empty));
}
Comments
I would recommend:
HashCode.Combine(array)
For .NET Core 2.1 / .NET Standard 2.1 / .NET 5 and later.
2 Comments
HashCode.Add() is given in stackoverflow.com/questions/59375124/….