2

I need to maintain an ordered collection of inputs that disallows duplicates.

The obvious option is LinkedHashSet, but a less obvious option might be a combination of HashSet and List. Here's an example showing both implementations:

class TestClass{
    private final Set<ThingIds> setOfThingIds;
    private final List<Thing> listOfThings;
    private final Set<Thing> linkedHashSetOfThings;

    // ... constructor instantiates empty objects

    void addThingsViaList(List<Thing> things){
        for(var thing: things){
            var id = getThingId(thing);
            if(setOfThingIds.contains(id)) continue;
            setOfThingIds.add(id);
            listOfThings.add(thing);
        }
    }

    void addThingsViaSet(List<Thing> things){
        for(var thing: things){
            linkedHashSetOfThings.add(thing); // thingId calc added to Thing's equals method
        }
    }
}

When would the first method be useful/required over the second?

One reason I could think of is if there were some bespoke expensive logic to be used for ids so adding it here would localise it and reduce the performance impact (as opposed to adding it on the equals() method where it would be used everywhere...though this is a weak argument since it could be stored on Thing as a field and invoked lazily from equals())...are there any other? Especially ones that aren't micro-optimisations?

5
  • Does it make sense that Thing's equals and hashCode methods only look at thingId and nothing else? These two methods will be used for everything, not just for the LinkedHashSet. Commented Jul 15 at 8:56
  • 4
    Logic can be simpler, there is no point calling set contains then add, just use return code of add for deciding whether to add to the list. Commented Jul 15 at 9:21
  • 5
    Do neither of those two options! Use LinkedHashMap with the id as key and Thing as value. Commented Jul 15 at 9:22
  • 3
    Quite a bit of chat over IDs and equality. Andorrax, is it actually the case that your object can have the same id yet not be equal? Commented Jul 15 at 11:22
  • 2
    As an aside seeing your LinkedHashSet as a SequencedSet since Java 21 seems appropriate here and useful for accessing in order. Or using SequencedMap implemented as LinkedHashMap. Commented Jul 15 at 12:16

3 Answers 3

5

The first approach has no objective advantages. You end up doing a continue; inside the loop, which is not helpful in making the code readable, you entangle two collections and you will need to worry about they being changed in an undesired manner by devs in a team not aware of the entanglement.

The second approach is more readable and handles implicitly stuff that you had to handle explicitly in the first approach.

The only "advantage" the first approach has is that you have a list of ids, but you can derive such an id list from the data-structure in the second list easily whenever you wish.

So the principles to apply here are:

  • "don't solve a problem you do not have"
  • "don't reimplement something that was implemented, you spend time, your solution will not necessarily be as good and you duplicate a feature that exists already"
  • "separate concerns, with your data-structures aim for their implicitly handling the problems they are capable of solving rather than doing it explicitly at the usage"

EDIT

In the comment-section disagreement was expressed on the applicability of the "don't solve a problem you do not have" principle as equals and hashCode is to be overriden just for this reason. However the point has its merit of course, I would argue that if a Thing needs to equal with another if their id matches, then not overriding the equals method is a flaw in the design and it needs to be overriden anyway, because thing1.equals(thing2) could return false even if their id matches if it is not overriden, leading to bugs.

So even if it was not to be ordered in a unique manner in a set, if we can postulate in general that they are equal when their id is matches, then we should do override equals anyways, so by the time we arrive to the linked hashset problem, we no longer have any doubts about overriding equals or not.

Sign up to request clarification or add additional context in comments.

4 Comments

"don't solve a problem you do not have" -> but the OP has a problem: the set needs to be unique based on 'id', but they are storing 'things'. They therefore need to change the equals and hashCode methods just for this reason. But those methods might be needed for other cases where the whole 'thing' needs to be checked and not just the id.
@k314159 thanks for your comment. I have edited my answer to reflect my thoughts on your remark.
I think the case brought up by @k314159 is when two Thing objects can have the same ID but are by design not equals due to other properties. Your edit addresses the case where equals was not overridden in the first place. If equals must be defined by more than just the ID, but the collection must be distinct based only on the ID, then using a sole LinkedHashSet<Thing> may not be an option.
@Slaw my position is that equals need to work consistently. So if two Things are to be equal despite their differences due to their id being equal (so it's the same Thing, albeit in different states), then equals needs to be overriden anyway. Otherwise, if equals need not to be overriden, but an overriden equals would be needed by the linked hashset, then indeed there is a problem to be addressed as you pointed out. There are neat ways to address that problem btw, but it is beyond the scope of this answer until it is confirmed that the asker indeed needs not to override equals.
4

First the conditions:

  • You want to keep the order of a LinkedHashSet: order on addition, a duplicate addition will not change the order.
  • ThingId will identify Thing uniquely.
  • The ThingId getter should not be a costly calculation.

Then the first alternative with separation simply does what the second alternative does more succinctly. So the second alternative seems best.

However there is also the possibility of LinkedHashMap.

private final Map<ThingId, Thing> linkedHashMapOfThings = new LinkedHashMao<>();

void addThingsViaMap(List<Thing> things){
    for (var thing: things) {
        linkedHashMapOfThings.put(getThingId(thing), thing);
    }
}

This provides a keySet().

Depending on the usage, especially Stream operations, having the ThingId might be nice.

Furthermore the Set implementation is somewhat a Map, so no extra costs.

1 Comment

This is exactly what I would do, with also a getThings() method that returns linkedHashMapOfThings.values() which will return a collection with the desired iteration order.
0

The two snippets are functionally different.

In the second snippet, you hold an ordered set of unique Thing objects. In the first snippet, you hold an ordered list of Thing objects and use a set to ensure theirs IDs are unique.
In other words, the second snippet will allow two different Things with the same ID, while the first will not. Whether this is a valid usecase or not depends on your business logic.

To make the first snippet equivelent to the first one, you'll need to use the set to store Things, not their IDs:

class TestClass {
    private final Set<Thing> setOfThing;
    private final List<Thing> listOfThings;
    private final Set<Thing> linkedHashSetOfThings;

    // ... constructor instantiates empty objects

    void addThingsViaList (List<Thing> things) {
        for (var thing: things) {
            if(setOfThing.contains(thing)) continue;
            setOfThingIds.add(thing);
            listOfThings.add(thing);
        }
    }

    // addThingsViaSet implementation omitted for brevity's sake
}

Comparing this implementation to the implementation of addThingsViaSet, I honestly see no upside to it.

3 Comments

Did you not see the comment on the second snippet, which says the Thing's equals method would have to be changed to use the id?
"In other words, the second snippet will allow two different Things with the same ID, while the second will not." I think you meant to write "first" for the second "second"
@knittl eap! indeed! thanks for noticing.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.