While some algorithms like MD5 haven't quite stood the test of time with regards to the security industry, others like the SHA family of functions have (thus far). Yet despite the discovery, or theoretical existence of collisions within their domains, cryptographic hash functions still provide an incredibly well distributed range of fixed length output mappings for data of arbitrary length and type – why aren’t they used in data structures more often? Isn’t the goal of a hash table (provided a good function) to map every input to a unique key, such that chaining, nested tables and other collision handling techniques become entirely moot? It’s certainly convenient being able to feed almost anything to a function, and know the exact length of the key you will receive! Seems like an ideal use for retired security protocols to me.
-
1In hash tables you almost always get collisions after you reduce modulo the size of the table, since you probably not want to create a table with 2^128 entries ;)CodesInChaos– CodesInChaos2014-03-12 09:58:56 +00:00Commented Mar 12, 2014 at 9:58
-
There has been a recent movement to SipHash, which is a keyed crypto hash in order to avoid attacker provoked collisions as a form of denial of service attack. (Search for hash-DoS)CodesInChaos– CodesInChaos2014-03-12 10:00:01 +00:00Commented Mar 12, 2014 at 10:00
-
2Note that the collision resistance of cryptographic hash functions completely depends on the output size. With 32 bit hashes, even an ideal hash function has to expect collisions at around 2^16 keys (65 thousand, that's a perfectly reasonable size for a hash table), due to the birthday problem.user395760– user3957602014-03-12 11:36:54 +00:00Commented Mar 12, 2014 at 11:36
1 Answer
Cryptographic hash function can and are used as the hash function in hash tables. Only not so often. The drawback for the cryptographic hashes is that they are very 'expensive' in terms of processing power needed compared to the more traditional hash functions used in hash tables.
Traditional hash functions have all the characteristics that you need for a hash table, but require way less CPU cycles to perform. This has changes a bit now most chipsets include hardware acceleration for these cryptographic hashes though.
And the 'index' generated with a cryptographic hash function is too large. SO you need to trim it down by either a reduction or masking. (You don't need 16 bytes of hash table indexes ;))
All in all, they are often not worth the hassle..