0

I know there's a lot of subject about "removing duplicates of a list". I liked the solution with HashSet. However, what I have is an list of String[], and it won't work with it. Probably because stringArray1.equals(stringArray2) will return false even if the two stringArray are the same; to compare string Array, we have to use Arrays.equals, which is not the case with HashSet.

So i have an userList of String[] users with only 2 strings in it: username, and userID. Since both are linked (there's only one userID per username), it would be enough for me to compare only one of those strings.

What I need is a fast way to remove duplicates from the list.

I thought about something like this:

List<String> userNamesList = new ArrayList<String>();
List<String[]> userListWithoutDuplicates = new ArrayList<String[]>();
for(String[] user : userList){
    if(!userNamesList.contains(user[0])){
        userNamesList.add(user[0]);
        userListWithoutDuplicates.add(user);
    }
}

However, this need two new List and a loop (I'm pretty sure any other solution would need this loop, still).

I'm wondering if there's not a better solution. I thought something like that should already be implemented somewhere.

EDIT: I got my array from an sql query. In fact, i have a DB and some users. One user will search for others users responding to certain conditions in DB, DB send back a list of String[] {username, userID} to this user. So i already have an user class, which contains far more than only username and ID. I have one instance of this class per connected user, but the DB can't access those instances, so she can't send it. I thought a String array was an easy solution. I didn't thought that, in certain cases, an user can be referenced more than one time in DB and so selected more than one time. That's why i got duplicates in my list.

11
  • Why are you using a String[] instead of a User class? Commented Sep 3, 2018 at 11:03
  • Which version of Java you are using? Commented Sep 3, 2018 at 11:03
  • you should turn the arrays into objects with 2 fields instead and have them override equals() and hashcode() Commented Sep 3, 2018 at 11:08
  • i'm using java 10. And i got my array from an sql query. I'll edit post to explain that better. Commented Sep 3, 2018 at 11:14
  • @Abila yes I understand but you can still turn them into objects when you are retrieving the data probably.. how are you accessing your DB? Commented Sep 3, 2018 at 11:18

6 Answers 6

2

If you are using Java 8 you can use stream

String[] arrWithDuplicates = new String[]{"John", "John", "Mary", "Paul"};
String[] arrWithoutDuplicates = Arrays.stream(arrWithDuplicates).distinct().toArray(String[]::new);

In arrWithoutDuplicates you'll have "John", "Mary" and "Paul"

Sign up to request clarification or add additional context in comments.

4 Comments

He has a list of arrays
So he can use flatMap function, for example list.stream().flatMap(Arrays::stream).distinct().collect(Collectors.toList());
no he can't because what would create a stream with both username and userid strings. I would put them into an object. It is Java after all :)
you're right, creating User class with proper equals() and hashCode() methods will be the best solution from "clean code" point of view, combined with using streams to remove duplicates from a collection of users or using Set
1

The best approach would be to map every user returned from the DB to an object with the two mentioned strings username and userID. Then hashCode and equals should be implemented according to your defintion of equality/duplicate. Based on this there are many ways to get rid of duplicates. You could add all found users to a Set or stream over a list of such users and call Stream.distinct() to reduce the users to unique ones:

List<User> distinctUsers = users.stream().distinct().collect(Collectors.toList());

If you need to go on with the current structure, you cannot use Stream.distinct() as it would compare string arrays by their object identity. The equality has to be specified explcitly. We can do this e.g. in the following way:

Function<String[], String> comparingBy = user -> user[1]; // user[1] = ID
List<String[]> distinctUsers = users.stream()
        .collect(Collectors.groupingBy(comparingBy))
        .values().stream()
        .map(u -> u.get(0))
        .collect(Collectors.toList());

This will group all users by the Function comapringBy. comapringBy should reflect your definition of equality, thus one from two equal users is a duplicate. According to Stream.distinct "the element appearing first in the encounter order is preserved". The result is a distinct list, a list without duplicates.

Another data type would be the mentioned Set. When creating a TreeSet it's also possible to provide the definition of equality explicitly. We can use the same comapringBy as above:

Set<String[]> distinctUsers = new TreeSet<>(Comparator.comparing(comparingBy));
distinctUsers.addAll(users);

1 Comment

This is completely correct.+ for Set with comparator
0

You can use the toMap collector to provide a custom keyMapper function which serves as a uniqueness test, then simply use the values of the map as your result.

For your uniqueness test, I think it makes more sense to use index 1 (the userID) instead of index 0 (the userName). However, if you wish to change it back, use arr[0] instead of arr[1] below:

List<String[]> userList = new ArrayList<>();
userList.add(new String[]{"George","123"});
userList.add(new String[]{"George","123"});
userList.add(new String[]{"George","456"});
List<String[]> userListNoDupes = new ArrayList<>(userList.stream()
    .collect(Collectors.toMap(arr-> arr[1], Function.identity(), (a,b)-> a)).values());
for(String[] user: userListNoDupes) {
    System.out.println(Arrays.toString(user));
}

Output:

[George, 123]

[George, 456]

1 Comment

This work and avoid using another List with only names in it. Thank you.
0

Edited: converted userNamesList to HashSet, thanks @Aris_Kortex. This can reduce complecity from O(n^2) to O(n), because complecity of searching in HashSet is O(1).

    Set<String> userSet = new HashSet<>(userNamesList);
    List<String[]> userListWithoutDuplicates = userList.stream()
        .filter(user -> !userSet.contains(user[0]))
        .collect(Collectors.toList());

distinct() on stream does not help as it remove all duplicates from stream: in this case it removes duplicates of arrays where 0th and 1st elements are equal to corresponding elements from other array.

But as I understand, TC would like to remove only those users who has names(0th element) containing in some predefined list.

6 Comments

Not very optimal as a this will effectively re-iterate the whole list for any given item of the Stream of strings.
Can be optimized a little with conversion userNamesList to HashSet before stream
Maybe, but I do not see any reason why not use a distinct()
While this code snippet may solve the question, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.
I said it was enough to remove according only to the names (0th elements), since if the names are the sames, the ID(1th element) will be the sames too. So distinct() may work as well. I've never used this stream() so i'll get a look at it.
|
0

I certainly think that you should use a Set rather than a list in first place. We can modify this according to your time and space complexity,Here is a simple 2 line answer to your code.

        Set set = new HashSet(userNamesList);
        List<String> list = new ArrayList(set);

A working example is run here : https://ideone.com/JznZCE It really depends on what you need to achieve,and if your users are unique, you should simply get a set rather than a list, Also if instead of "String",the info is contained in user object, the order of users need not be changed by this and can be implemented to put users by id or name later.

You can then change how equals is compared by overriding Equals and hashcode method of User Class to use custom implementation to compare.

Hope this helps!

Edit: If source of info is coming from DB,See how you can get a unique list by use of "DISTINCT" keyword (similar mysql construct) , to handle this logic away from your code.

3 Comments

See the second phrase of my post.
@Ablia You need to handle the comparison logic in your custom equals and hashcode method overriding the default.
yeah, that's waht i thought too, however i have no idea how to do that. I mean, overriding default code of a java Class. But i'm looking into it.
-1

Check this topic: Removing duplicate elements from a List

You can convert the list in a set (which doesn't allow duplicates) and then back in a List if you really need this type of collection.

2 Comments

Not an answer. You need to answer the question rather than link to something.
I already said in my post that it won't work because i have an list of String Array. The method HashSet use to delete duplicate is Object1.equals(Object2), which doesn't work with arrays.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.