74

How to find whether the List<string> has duplicate values or not ?

I tried with below code. Is there any best way to achieve ?

var lstNames = new List<string> { "A", "B", "A" };

if (lstNames.Distinct().Count() != lstNames.Count())
{
    Console.WriteLine("List contains duplicate values.");
}
2
  • 1
    Sorry Guys.. I missed the simple logic. Commented Jan 16, 2013 at 16:59
  • 24
    Please don't say sorry. We all here for learning.. Commented Jan 16, 2013 at 17:02

5 Answers 5

126

Try to use GroupBy and Any like;

lstNames.GroupBy(n => n).Any(c => c.Count() > 1);

GroupBy method;

Groups the elements of a sequence according to a specified key selector function and projects the elements for each group by using a specified function.

Any method, it returns boolean;

Determines whether any element of a sequence exists or satisfies a condition.

Sign up to request clarification or add additional context in comments.

7 Comments

How is this better than the code in the OP? You still need to group all of the items, so you don't really have any short circuiting here.
This not only has to iterate through all the elements to build the groups, it then has to iterate through potentially all of the groups too. Your original coffee will be faster.
... Damn you, autocorrect.
According to my tests, original code is at least 1.5 times faster (depending on inputs) than this
How about replacing c.Count() > 1 with c.Skip(1).Any()?
|
51

If you're looking for the most efficient way of doing this,

var lstNames = new List<string> { "A", "B", "A" };
var hashset = new HashSet<string>();
foreach(var name in lstNames)
{
    if (!hashset.Add(name))
    {
        Console.WriteLine("List contains duplicate values.");
        break;
    }
}

will stop as soon as it finds the first duplicate. You can wrap this up in a method (or extension method) if you'll be using it in several places.

4 Comments

+1 performance ten times better in worst case, than in GroupBy
@IlyaIvanov Actually, in the worst case (no duplicates), it's about the same, maybe just a tad faster. In the best case (the first two items are duplicates) it's 100% faster, as it will be O(1) not O(n). In the general case it will be dependent on the actual rate of duplicates in the underlying data, while GroupBy and Distinct take the same time regardless of the underlying data.
"O" means "worst case" by the way. There is no "in the best case it will be O(x)"
@JohnShedletsky 'O(f)' represents the set of functions that don't grow faster than f, that is to say, g(x) <= f(x) * C for g in O(f) and some constant C, if x is large enough. It doesn't imply anything about best or worst cases.
29

A generalized and compact extension version of the answer based on hash technique:

public static bool AreAnyDuplicates<T>(this IEnumerable<T> list)
{
    var hashset = new HashSet<T>();
    return list.Any(e => !hashset.Add(e));
}

5 Comments

cool, I've added it to my linq extensions, I added an overload to provide a comparer though.
I know this pretty old and even though creating an extension method is cool but this is a really bad from performance perspective.. It should be using group by rather than trying to insert each and every list object to hashset.
@curiousBoy I'm pretty sure that GroupBy is implemented using some kind of hashed structure internally, so basically it should have about the same performance. According to my best knowledge adding elements to a HashSet is "cheap" in terms of computation and uses at most the same amount of memory as the original list. Also, I'm not sure but having GroupBy and Any after each other might not be very lazy while it's obvious that this solution will stop on first duplicate item. Could you please clarify why you think it has poor performance?
@Eluvatar I know you answered this a long time ago, but, I'd love to see your code if you're willing to share. Thanks!
@ErickBrown The constructor of a HashSet<T> does accept a custom comparer, I think @Eluvatar meant to expose that as a parameter of this extension.
12
var duplicateExists = lstNames.GroupBy(n => n).Any(g => g.Count() > 1);

3 Comments

Hmmmm.. I missed the simple logic.
I think Any() is preferred than Count(), but I don't know the performance difference between Distinct() and GroupBy()
I think in case of List<someClass> you need to group by all of the items and again you need to apply Any() of all items. I am not sure how can i compare with just using Count() in my example.
0
 class Program
{
    static void Main(string[] args)
    {
        var listFruits = new List<string> { "Apple", "Banana", "Apple", "Mango" };
        if (FindDuplicates(listFruits)) { WriteLine($"Yes we find duplicate"); };
        ReadLine();
    }
    public static bool FindDuplicates(List<string> array)
    {
        var dict = new Dictionary<string, int>();
        foreach (var value in array)
        {
            if (dict.ContainsKey(value))
                dict[value]++;
            else
                dict[value] = 1;
        }
        foreach (var pair in dict)
        {
            if (pair.Value > 1)
                return true;
            else
                return false;
        }
        return false;
    }
}  

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.