0

I have a set of 100,000 IDs I need to hash into an array that has 50 buckets.

The IDs are of the form: AA00000...AA99999. I know there are functions like md5 available, but those produce digests not indexes into an array. How could I implement a hash such that for each ID it returns an index into my array?

Thanks in advance. (Implementing this in Python)

1 Answer 1

2

Just use the built-in hash() function, and use modulus to limit the result to 50:

hash(yourid) % 50

From the documentation:

Return the hash value of the object (if it has one). Hash values are integers.

For your given inputs, this picks slots uniformly enough:

>>> from collections import Counter
>>> histogram = Counter(hash('AA{:05d}'.format(i)) % 50 for i in range(100000))
>>> for i in range(50):
...     print '{:4d}: {}'.format(histogram[i], '*' * (histogram[i] // 40))
... 
1932: ************************************************
1932: ************************************************
1941: ************************************************
1941: ************************************************
1908: ***********************************************
1908: ***********************************************
1974: *************************************************
1974: *************************************************
2012: **************************************************
2012: **************************************************
1898: ***********************************************
1898: ***********************************************
1954: ************************************************
1954: ************************************************
1925: ************************************************
1925: ************************************************
1995: *************************************************
1995: *************************************************
1982: *************************************************
1982: *************************************************
2023: **************************************************
2023: **************************************************
2025: **************************************************
2025: **************************************************
2070: ***************************************************
2070: ***************************************************
2042: ***************************************************
2042: ***************************************************
2028: **************************************************
2028: **************************************************
2120: *****************************************************
2120: *****************************************************
2064: ***************************************************
2064: ***************************************************
2100: ****************************************************
2100: ****************************************************
2057: ***************************************************
2057: ***************************************************
2039: **************************************************
2039: **************************************************
1981: *************************************************
1981: *************************************************
1956: ************************************************
1956: ************************************************
2000: **************************************************
2000: **************************************************
1982: *************************************************
1982: *************************************************
1992: *************************************************
1992: *************************************************
Sign up to request clarification or add additional context in comments.

5 Comments

why not reduce(operator.xor,map(ord,my_string)) % 50 :P (+1)
Thanks Martijn; does the hash() function generate uniform distribution?
@JoranBeasley: why have a dog, and then bark yourself, is what I'd like to know..
I'm running it now and its not giving me a uniform distribution (or even close). What about multiplying each number by 7 or 31 and then summing it together and then using modulus to limit?
@SamTurani: ah, see, you didn't ask that. I do believe it does though; I added a histogram for you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.