10

Could you someone shed some light on the hashing function/algorithm Perl uses to map a string to an index ? Any relevant reading ?

5
  • What are you trying to do? Can you provide an example of some code that isn't working? Commented Jun 26, 2012 at 19:16
  • Any key which will create a collision :) Commented Jun 26, 2012 at 20:43
  • There is no two key that will always collide. The hashing is randomly perturbed when necessary. Commented Jun 26, 2012 at 20:48
  • 1
    You can tell a collision occurred by looking for a lack of increase in the left-hand number returned by perl -wE'for ("a".."z") { ++$h{$_}; say "".%h; }' Commented Jun 26, 2012 at 20:49
  • 1
    alertjean: Why would any collision be "code that isn't working"? I think you are expecting problems that just don't exist. Commented Jun 27, 2012 at 7:46

1 Answer 1

15

[This answer predates the hashing function change made in 5.28. See "Default Hash Function Change" in the perldelta for 5.28.]

PERL_HASH_INTERNAL_, defined in hv.h, copied below:

/* hash a key */
/* FYI: This is the "One-at-a-Time" algorithm by Bob Jenkins
 * from requirements by Colin Plumb.
 * (http://burtleburtle.net/bob/hash/doobs.html) */
/* The use of a temporary pointer and the casting games
 * is needed to serve the dual purposes of
 * (a) the hashed data being interpreted as "unsigned char" (new since 5.8,
 *     a "char" can be either signed or unsigned, depending on the compiler)
 * (b) catering for old code that uses a "char"
 *
 * The "hash seed" feature was added in Perl 5.8.1 to perturb the results
 * to avoid "algorithmic complexity attacks".
 *
 * If USE_HASH_SEED is defined, hash randomisation is done by default
 * If USE_HASH_SEED_EXPLICIT is defined, hash randomisation is done
 * only if the environment variable PERL_HASH_SEED is set.
 * For maximal control, one can define PERL_HASH_SEED.
 * (see also perl.c:perl_parse()).
 */

#define PERL_HASH_INTERNAL_(hash,str,len,internal) \
    STMT_START { \
       register const char * const s_PeRlHaSh_tmp = str; \
       register const unsigned char *s_PeRlHaSh = (const unsigned char *)s_PeRlHaSh_tmp; \
       register I32 i_PeRlHaSh = len; \
       register U32 hash_PeRlHaSh = (internal ? PL_rehash_seed : PERL_HASH_SEED); \
       while (i_PeRlHaSh--) { \
           hash_PeRlHaSh += *s_PeRlHaSh++; \
           hash_PeRlHaSh += (hash_PeRlHaSh << 10); \
           hash_PeRlHaSh ^= (hash_PeRlHaSh >> 6); \
       } \
       hash_PeRlHaSh += (hash_PeRlHaSh << 3); \
       hash_PeRlHaSh ^= (hash_PeRlHaSh >> 11); \
       (hash) = (hash_PeRlHaSh + (hash_PeRlHaSh << 15)); \
   } STMT_END
Sign up to request clarification or add additional context in comments.

2 Comments

Is this implementation of some generic algorithm ?
@alertjean Yes, the comments from the code say it's a version of burtleburtle.net/bob/hash/doobs.html

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.