4

I want to generate UUIDs based on objects. Objects that are equal need to have the same UUID.

I read about the type 3 UUIDs whose value is based on a name and namespace. java.util.UUID has a nameUUIDFromBytes method that takes a byte array as argument.

So I was thinking of serializing my objects into byte arrays and feeding those to the nameUUIDFromBytes method.

But I am confused about that namespace aspect of the UUID. Does that mean the UUID will be different when generated on another machine?

What is the best way to generate UUIDs such that when obj1.equals(obj2) == true, then uuid1.equals(uuid2) == true even when uuid1 is generated on another machine than uuid2?

12
  • 1
    Possible duplicate of Best implementation for hashCode method Commented Feb 8, 2017 at 10:59
  • 1
    @ΦXocę웃Пepeúpaツ I should have specified: I will obviously override the equals methods. That's not the issue here. The question is how I can generate UUIDs. Commented Feb 8, 2017 at 11:01
  • 1
    @roeygol Unless you suggest generating the UUIDs based on hashCodes, I hardly think these are duplicate questions. Commented Feb 8, 2017 at 11:01
  • 1
    @MaartenDhondt that I understood. But let's say that your Object is Person and equality is defined as a person with the same name (yes, bad idea, whatever). Now, if you generate a UUID from new Person("Jim") and then generate a UUID from the new Person("Jim") and they must be the same UUID; surely you can use "Jim" as the UUID?! Basically, work out what exactly equals means in your context and use that. You seem to be looking for some magic way for the computer to work out equals semantics for your domain - sadly no such magic exists. Commented Feb 8, 2017 at 11:06
  • 1
    @MaartenDhondt you can examine the source code thyself, there's no namespaces, just MD5 of byte[] input. Commented Feb 8, 2017 at 11:25

3 Answers 3

2

The result will be the same from different machines. It is like hashing them.

By using nameUUIDFromBytes you create a type 3 compliant uuid.

Sign up to request clarification or add additional context in comments.

Comments

0

The UUID that an operating system will generate reserves the right to blend in information from the machine along with the time information &c. (In fact early Microsoft UUID generators would take network card information which was really quite insecure since it was possible to back that out from a generated UUID!).

So using your favourite UUID generator is not appropriate.

What you could do is to essentially enhance the methods used to create a hash code, extending that to 128 bits. Convert that byte array to a UUID format, and you're done.

1 Comment

There is a type 3 UUID which based on MD5 digest of a user-supplied name and not using any machine specific information. For example, see a source code of java.util.UUID#nameUUIDFromBytes method, there nothing except MD5 hashing and repacking of resulting bytes as UUID. So the author of original question definitely can pack all his class fields used in equals as byte[] with help of Java serialization or NIO and pass resulting bytes to nameUUIDFromBytes to obtain type 3 UUID.
0

Caveat: I just threw this code together roughly. Not tested. Use at our own risk.

Ignore the accepted Answer

The accepted Answer is incorrect.

The entire point of Version 3 and Version 5 name-based UUID is that, given the same inputs (namespace and name), you get the very same UUID output. The process is deterministic, repeatable. Contrary to that other Answer, there is no “blending in” of information from the local machine.

By the way, “name-based” is bit of a misnomer. Some folks refer to these two Versions of UUID more descriptively as “hash-based”.

Example code

Define your namespace. Here we use DNS domain name owned by our company.

final String domainName = "awesome-app.basil.work";

You may want to make this a static final constant in your app(s) for re-use.

Define the “name” for the object you want to generate a UUID.

For example, say we want to identify products in this Java record class:

public record Product( UUID id , String partNo , String vendorCode , String name ){}

Example data:

String partNo = "223-612-3";
String vendorCode = "TacomaScrew";
String productName = "M6-1.0 X 12 Mm Metric Socket Head Cap Screws - Class 12.9, Zinc, Coarse 100/PKG";

We want to use values that are stable, that will not be changing over time, for a reproducible UUID value. So we likely do not want to use name of the product. Such product titles are likely to change on occasion. Instead we want to use only the part number and the vendor code. Those should never change. And together those should never collide.

String name = partNo + venderCode ;  // Ignoring product name. 

Concatenate the namespace and name.

String namespaceAndName = domainName + name ;

Convert that to an array of bytes. Tip: Specify a charset explicitly to ensure that you always convert to the exact same byte values. The entire point here is to use the very same namespace across all the places where you generate the UUIDs.

byte[] bytes = namespaceAndName.getBytes( StandardCharsets.UTF_8 );

Generate a Version 3 UUID using the UUID class bundled with Java, the java.util.UUID class. When we submit our array of byte values, this function hashes those per the MD5 standard. Every proper implementation of MD5 hash-generation will produce the very same output when given the very same input. After truncation, those bits are used as our UUID save for the four Version and two Variant bits defined (hard-coded) by the UUID standard.

UUID uuid = UUID.nameUUIDFromBytes( bytes );

Now we can assemble our Product record object.

Product productScrew = new Product ( uuid , partNo , vendorCode , productName ) ;

We can generate text in standard canonical format to represent the 128-bits value of our UUID.

String uuidHexAndDash = uuid.toString() ;  // The "hex-and-dash" string format defined by RFC 9562. 

For your copy-and-paste convenience, here is the entire example.

final String domainName = "awesome-app.basil.work";

String partNo = "223-612-3";
String vendorCode = "TacomaScrew";
String productName = "M6-1.0 X 12 Mm Metric Socket Head Cap Screws - Class 12.9, Zinc, Coarse 100/PKG";

String name = partNo + vendorCode;  // Ignoring the product name.
String namespaceAndName = domainName + name;
byte[] bytes = namespaceAndName.getBytes( StandardCharsets.UTF_8 );

UUID uuid = UUID.nameUUIDFromBytes( bytes );
String uuidHexAndDash = uuid.toString();  // The "hex-and-dash" string format defined by RFC 9562.

IO.println( "bytes = " + Arrays.toString( bytes ) );
IO.println( uuidHexAndDash );

bytes = [97, 119, 101, 115, 111, 109, 101, 45, 97, 112, 112, 46, 98, 97, 115, 105, 108, 46, 119, 111, 114, 107, 50, 50, 51, 45, 54, 49, 50, 45, 51, 84, 97, 99, 111, 109, 97, 83, 99, 114, 101, 119]

ff54ffdf-aff5-34b0-b64f-7e631eacf694

Notice how every time you input that particular domain name, part number, and vendor code, you always get the very same UUID generated.

By the way, the above discussion is a simplistic approach that works well enough for in-house use only of the UUIDs. If instead you intend for these UUIDs to circulate on the Internets or amongst the public, then you should more formally define your “namespace” as a UUID to be registered with the IANA. See discussion in the standard, RFC 9562. You would change the above code like this:

UUID namespaceUuid = UUID.nameUUIDFromBytes( domainName.getBytes( StandardCharsets.UTF_8 ) ) ;  // Register this UUID with the IANA to avoid collisions. 

byte[] nameBytes = name.getBytes( StandardCharsets.UTF_8 ) ;
ByteBuffer bb = ByteBuffer.allocate( 16 + nameBytes.length );
bb.putLong( uuid.getMostSignificantBits() );
bb.putLong( uuid.getLeastSignificantBits() );
bb.put( nameBytes ) ;
byte[] bytes = bb.array();

UUID uuid = UUID.nameUUIDFromBytes( bytes );

Your questions

You said:

Does that mean the UUID will be different when generated on another machine?

No. Same inputs get the same output, generating the very same UUID.

Again, the entire point of Version 3 & 5 UUID is to get the same UUID from the the same inputs. If you have one namespace and one name on a machine in Paris, and a machine in Casablanca has the same namespace and the same name, they will both generate the exact same UUID. The only coordination needed between the machines is that must both be informed ahead of time as to the chosen namespace value. And the machines must agree on what fields of data to use as the “name”.

Hashing

Version 3 & 5 UUID run your inputs (namespace and name) through a hashing algorithm to generate an arbitrary but deterministic value. The hash value is too long for a UUID, so some bits are truncated. This truncation raises the chances of a duplicate, but not practically so for common purposes. The truncated bits are then laid as the bits of the new UUID except for six semantic bits required by the UUID specification to identity UUID Version and Variant. And, voilà, you have your predictably generated UUID.

Version 5 is the exact same as Version 3 but swaps out MD5 to instead use the more modern SHA-1 hashing algorithm. MD5 has some flaws, and is not as cryptographically-strong as SHA-1. But, (a) The java.util.UUID class in Java does not provide for generation of Version 5 UUIDs, and (b) if your business domain does not have security concerns regarding the generation of these UUIDs, then Version 3 works well enough. If you really want Version 5, obtain a third-party UUID generating library or roll your own.

You said:

What is the best way to generate UUIDs such that when obj1.equals(obj2) == true, then uuid1.equals(uuid2) == true

If you really want that… Whatever fields you compare in your implementation of the override of Object#equals must be used as the name fed into the Version 3 or 5 UUID generator.

So we really should change the example record class used in example code above. In our UUID generation, we examined only two of the three fields on that record, setting aside the fourth field of UUID id. To meet your exact requirement of:

obj1.equals(obj2) == true, then uuid1.equals(uuid2) == true

… we need to add overrides to our record for equals and hashCode.

public record Product( UUID id, String partNo, String vendorCode, String name ) {
    @Override
    public boolean equals( Object o ) {
        if ( this == o ) return true;
        if ( o == null || getClass() != o.getClass() ) return false;
        Product product = (Product) o;
        return Objects.equals( this.partNo, product.partNo ) &&
               Objects.equals( this.vendorCode, product.vendorCode );
    }

    @Override
    public int hashCode() {
        return Objects.hash( this.partNo, this.vendorCode );
    }
}

You commented:

reason I am not using the fields (that matter for equality) directly is because my objects have dozens of fields and concatenating them to create a uuid will result in a String that's way too long. Hashing such a string might be the solution

Hashing is exactly what the Version 3 or 5 UUID generator does for you!

So, no, the string you pass as the name for the UUID will not be too long. The hashing algorithm always produces a result of a certain number of bits regardless of how long was the input. If you do not understand that, study up on hashing at Wikipedia etc.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.