56

I wanted to generate a unique hash code for a string in put in android. Is there any predefined library is there or we have to generate manually. Please any body if knows please present a link or a code stuff.

7
  • 1
    What about built-in hashCode for Strings? Commented May 25, 2011 at 6:59
  • 3
    unique hash code? why? and how do you even think that's possible? Commented May 25, 2011 at 7:01
  • 9
    Please elaborate. Unique hash codes are impossible (unless they can have an infinite length), since there is an infinity of possible strings. Commented May 25, 2011 at 7:03
  • That is totally wrong. I can think of five ways to create a unique hash code just off the top of my head. It all starts with overriding the hashCode () function. Anyway, the last comment is a bit of an oversimplification: If a hash is created on a large enough domain with a robust generator, the chances of a collision can be extremely small. IFF you override hashCode to use a thread-safe incrementor, you can have unique values. Most of the time, though, when implemented correctly, it just does not matter probability-wise. Commented May 3, 2013 at 22:19
  • 3
    You have no idea what the OP's context is. Absolutely no idea. Assuming that when he says "unique" that he doesn't mean that is a huge stretch. Anyhow, the challenge remains: Show us how. Commented Nov 12, 2015 at 22:47

8 Answers 8

66

It depends on what you mean:

  • As mentioned String.hashCode() gives you a 32 bit hash code.

  • If you want (say) a 64-bit hashcode you can easily implement it yourself.

  • If you want a cryptographic hash of a String, the Java crypto libraries include implementations of MD5, SHA-1 and so on. You'll typically need to turn the String into a byte array, and then feed that to the hash generator / digest generator. For example, see @Bryan Kemp's answer.

  • If you want a guaranteed unique hash code, you are out of luck. Hashes and hash codes are non-unique.

A Java String of length N has 65536 ^ N possible states, and requires an integer with 16 * N bits to represent all possible values. If you write a hash function that produces integer with a smaller range (e.g. less than 16 * N bits), you will eventually find cases where more than one String hashes to the same integer; i.e. the hash codes cannot be unique. This is called the Pigeonhole Principle, and there is a straight forward mathematical proof. (You can't fight math and win!)

But if "probably unique" with a very small chance of non-uniqueness is acceptable, then crypto hashes are a good answer. The math will tell you how big (i.e. how many bits) the hash has to be to achieve a given (low enough) probability of non-uniqueness.

Sign up to request clarification or add additional context in comments.

3 Comments

64-bit hashcode: for completeness if you want a 64-bit one, from sfussenegger in stackoverflow.com/questions/1660501/…
So, a 32-bit hash can only uniquely identify a String with 2 characters?
Basically ... yes. (Assuming character == arbitrary char value. It gets a bit more complicated if character means Unicode codepoint ... or (say) ASCII codepoint.)
39

This is a class I use to create Message Digest hashes

import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;

public class Sha1Hex {

    public String makeSHA1Hash(String input)
            throws NoSuchAlgorithmException, UnsupportedEncodingException
        {
            MessageDigest md = MessageDigest.getInstance("SHA1");
            md.reset();
            byte[] buffer = input.getBytes("UTF-8");
            md.update(buffer);
            byte[] digest = md.digest();

            String hexStr = "";
            for (int i = 0; i < digest.length; i++) {
                hexStr +=  Integer.toString( ( digest[i] & 0xff ) + 0x100, 16).substring( 1 );
            }
            return hexStr;
        }
}

Comments

9
String input = "some input string";
int hashCode = input.hashCode();
System.out.println("input hash code = " + hashCode);

6 Comments

@Vladimir - by definition no hash code is defined to be unique! Hashcode needs to be well distributed, uniqueness idea is a faulty understanding of the OP.
if hash code was unique, that'd be one hell of a compression algorithm.
e.g. try "[email protected]" and "[email protected]" they have the same hash values when using hashCode ;)
@Simon, just ran your example in .NET because I was curious. They must have different base hashing algorithms because they aren't exact matches there. dotnetfiddle.net/6YJRpV
Maybe what OP meant by unique is: unique-for-a-given-input-string (no two hashes should be generated for the same string).
|
4

I use this i tested it as key from my EhCacheManager Memory map ....

Its cleaner i suppose

   /**
     * Return Hash256 of String value
     *
     * @param text
     * @return 
     */
    public static String getHash256(String text) {
        try {
            return org.apache.commons.codec.digest.DigestUtils.sha256Hex(text);
        } catch (Exception ex) {
            Logger.getLogger(HashUtil.class.getName()).log(Level.SEVERE, null, ex);
            return "";
        }
    }

am using maven but this is the jar commons-codec-1.9.jar

Comments

4

For me it worked

   public static long getUniqueLongFromString (String value){
       return  UUID.nameUUIDFromBytes(value.getBytes()).getMostSignificantBits();
    }

1 Comment

Great one-liner. Under the hood, nameUUIDFromBytes is pretty much the same as Bryan Kemp's answer: it uses MessageDigest and perform an MD5 hash. But since this is already encapsulated, makes my project "cleaner" =)
3

You can use this code for generating has code for a given string.

int hash = 7;
for (int i = 0; i < strlen; i++) {
    hash = hash*31 + charAt(i);
}

1 Comment

Why did you start at 7?
2

A few line of java code.

public static void main(String args[]) throws Exception{
       String str="test string";
       MessageDigest messageDigest=MessageDigest.getInstance("MD5");
       messageDigest.update(str.getBytes(),0,str.length());
       System.out.println("MD5: "+new BigInteger(1,messageDigest.digest()).toString(16));
}

Comments

0

Let's take a look at the stock hashCode() method:

public int hashCode() {
    int h = hash;
    if (h == 0 && count > 0) {
        for (int i = 0; i < count; i++) {
            h = 31 * h + charAt(i);
        }
        hash = h;
    }
    return h;
}

The block of code above comes from the java.lang.String class. As you can see it is a 32 bit hash code which fair enough if you are using it on a small scale of data. If you are looking for hash code with more than 32 bit, you might wanna checkout this link: http://www.javamex.com/tutorials/collections/strong_hash_code_implementation.shtml

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.