Using LINQ, from a List<int>, how can I retrieve a list that contains entries repeated more than once and their values?
14 Answers
The easiest way to solve the problem is to group the elements based on their value, and then pick a representative of the group if there are more than one element in the group. In LINQ, this translates to:
var query = lst.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(y => y.Key)
.ToList();
If you want to know how many times the elements are repeated, you can use:
var query = lst.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(y => new { Element = y.Key, Counter = y.Count() })
.ToList();
This will return a List of an anonymous type, and each element will have the properties Element and Counter, to retrieve the information you need.
And lastly, if it's a dictionary you are looking for, you can use
var query = lst.GroupBy(x => x)
.Where(g => g.Count() > 1)
.ToDictionary(x => x.Key, y => y.Count());
This will return a dictionary, with your element as key, and the number of times it's repeated as value.
11 Comments
code for (int i = 0; i < duplicates.Count; i++) { int duplicate = duplicates[i]; duplicatesLocation.Add(duplicate, new List<int>()); for (int k = 0; k < hitsList.Length; k++) { if (hitsList[k].Contains(duplicate)) { duplicatesLocation.ElementAt(i).Value.Add(k); } } // remove duplicates according to some rules. } codeFind out if an enumerable contains any duplicate :
var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);
Find out if values in an enumerable are all unique :
var allUnique = enumerable.GroupBy(x => x.Key).All(g => g.Count() == 1);
To find the duplicate values only:
var duplicates = list.GroupBy(x => x.Key).Where(g => g.Count() > 1);
E.g.
var list = new[] {1,2,3,1,4,2};
GroupBy will group the numbers by their keys and will maintain the count (number of times it is repeated) with it. After that, we are just checking the values which have repeated more than once.
To find the unique values only:
var unique = list.GroupBy(x => x.Key).Where(g => g.Count() == 1);
E.g.
var list = new[] {1,2,3,1,4,2};
GroupBy will group the numbers by their keys and will maintain the count (number of times it repeated) with it. After that, we are just checking the values who have repeated only once means are unique.
6 Comments
var unique = list.Distinct(x => x)Distinct works differently, in that it will not just return the values which appear only once, but also the values which appear multiple times (but it will return them only once instead of all of the multiple times); which is different from what the answer was referring to..All(g => g.Count() == 1) should be .Where(g => g.Count() == 1). All would not "find the unique values" as you suggest, it would confirm that there are no duplicates in the entire list (= that all groups have a count of 1)Another way is using HashSet:
var hash = new HashSet<int>();
var duplicates = list.Where(i => !hash.Add(i));
If you want unique values in your duplicates list:
var myhash = new HashSet<int>();
var mylist = new List<int>(){1,1,2,2,3,3,3,4,4,4};
var duplicates = mylist.Where(item => !myhash.Add(item)).Distinct().ToList();
Here is the same solution as a generic extension method:
public static class Extensions
{
public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector, IEqualityComparer<TKey> comparer)
{
var hash = new HashSet<TKey>(comparer);
return source.Where(item => !hash.Add(selector(item))).ToList();
}
public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)
{
return source.GetDuplicates(x => x, comparer);
}
public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
{
return source.GetDuplicates(selector, null);
}
public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source)
{
return source.GetDuplicates(x => x, null);
}
}
6 Comments
List<int> { 1, 2, 3, 4, 5, 2 } as the source, the result is an IEnumerable<int> with one element having the value of 1 (where the correct duplicate value is 2)Console.WriteLine("Count: {0}", duplicates.Count()); directly below it and it prints 6. Unless I'm missing something about the requirements for this function, there should only be 1 item in the resulting collection.ToList in order to fix the issue, but it means that the method is executed as soon as it called, and not when you iterate over the results.var hash = new HashSet<int>(); var duplicates = list.Where(i => !hash.Add(i)); will lead to a list that includes all occurrences of duplicates. So if you have four occurrences of 2 in your list, then your duplicate list will contain three occurrences of 2, since only one of the 2's can be added to the HashSet. If you want your list to contain unique values for each duplicate, use this code instead: var duplicates = mylist.Where(item => !myhash.Add(item)).ToList().Distinct().ToList();You can do this:
var list = new[] {1,2,3,1,4,2};
var duplicateItems = list.Duplicates();
With these extension methods:
public static class Extensions
{
public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
{
var grouped = source.GroupBy(selector);
var moreThan1 = grouped.Where(i => i.IsMultiple());
return moreThan1.SelectMany(i => i);
}
public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source)
{
return source.Duplicates(i => i);
}
public static bool IsMultiple<T>(this IEnumerable<T> source)
{
var enumerator = source.GetEnumerator();
return enumerator.MoveNext() && enumerator.MoveNext();
}
}
Using IsMultiple() in the Duplicates method is faster than Count() because this does not iterate the whole collection.
7 Comments
Count() is pre computed and your solution is likely slower.Count()] is basically different than iterating the whole list. Count() is pre-computed but iterating the whole list is not.I created a extention to response to this you could includ it in your projects, I think this return the most case when you search for duplicates in List or Linq.
Example:
//Dummy class to compare in list
public class Person
{
public int Id { get; set; }
public string Name { get; set; }
public string Surname { get; set; }
public Person(int id, string name, string surname)
{
this.Id = id;
this.Name = name;
this.Surname = surname;
}
}
//The extention static class
public static class Extention
{
public static IEnumerable<T> getMoreThanOnceRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
{ //Return only the second and next reptition
return extList
.GroupBy(groupProps)
.SelectMany(z => z.Skip(1)); //Skip the first occur and return all the others that repeats
}
public static IEnumerable<T> getAllRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
{
//Get All the lines that has repeating
return extList
.GroupBy(groupProps)
.Where(z => z.Count() > 1) //Filter only the distinct one
.SelectMany(z => z);//All in where has to be retuned
}
}
//how to use it:
void DuplicateExample()
{
//Populate List
List<Person> PersonsLst = new List<Person>(){
new Person(1,"Ricardo","Figueiredo"), //fist Duplicate to the example
new Person(2,"Ana","Figueiredo"),
new Person(3,"Ricardo","Figueiredo"),//second Duplicate to the example
new Person(4,"Margarida","Figueiredo"),
new Person(5,"Ricardo","Figueiredo")//third Duplicate to the example
};
Console.WriteLine("All:");
PersonsLst.ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
/* OUTPUT:
All:
1 -> Ricardo Figueiredo
2 -> Ana Figueiredo
3 -> Ricardo Figueiredo
4 -> Margarida Figueiredo
5 -> Ricardo Figueiredo
*/
Console.WriteLine("All lines with repeated data");
PersonsLst.getAllRepeated(z => new { z.Name, z.Surname })
.ToList()
.ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
/* OUTPUT:
All lines with repeated data
1 -> Ricardo Figueiredo
3 -> Ricardo Figueiredo
5 -> Ricardo Figueiredo
*/
Console.WriteLine("Only Repeated more than once");
PersonsLst.getMoreThanOnceRepeated(z => new { z.Name, z.Surname })
.ToList()
.ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
/* OUTPUT:
Only Repeated more than once
3 -> Ricardo Figueiredo
5 -> Ricardo Figueiredo
*/
}
2 Comments
there is an answer but i did not understand why is not working;
var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);
my solution is like that in this situation;
var duplicates = model.list
.GroupBy(s => s.SAME_ID)
.Where(g => g.Count() > 1).Count() > 0;
if(duplicates) {
doSomething();
}
1 Comment
Just an another approach:
For just HasDuplicate:
bool hasAnyDuplicate = list.Count > list.Distinct().Count;
For duplicate values
List<string> duplicates = new List<string>();
duplicates.AddRange(list);
list.Distinct().ToList().ForEach(x => duplicates.Remove(x));
// for unique duplicate values:
duplicates.Distinct():
Comments
Complete set of Linq to SQL extensions of Duplicates functions checked in MS SQL Server. Without using .ToList() or IEnumerable. These queries executing in SQL Server rather than in memory.. The results only return at memory.
public static class Linq2SqlExtensions {
public class CountOfT<T> {
public T Key { get; set; }
public int Count { get; set; }
}
public static IQueryable<TKey> Duplicates<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
=> source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(s => s.Key);
public static IQueryable<TSource> GetDuplicates<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
=> source.GroupBy(groupBy).Where(w => w.Count() > 1).SelectMany(s => s);
public static IQueryable<CountOfT<TKey>> DuplicatesCounts<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
=> source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(y => new CountOfT<TKey> { Key = y.Key, Count = y.Count() });
public static IQueryable<Tuple<TKey, int>> DuplicatesCountsAsTuble<TSource, TKey>(this IQueryable<TSource> source, Expression<Func<TSource, TKey>> groupBy)
=> source.GroupBy(groupBy).Where(w => w.Count() > 1).Select(s => Tuple.Create(s.Key, s.Count()));
}
Comments
This More simple way without use Groups just get the District elements and then iterate over them and check their count in the list if their count is >1 this mean it appear more than 1 item so add it to Repeteditemlist
var mylist = new List<int>() { 1, 1, 2, 3, 3, 3, 4, 4, 4 };
var distList= mylist.Distinct().ToList();
var Repeteditemlist = new List<int>();
foreach (var item in distList)
{
if(mylist.Count(e => e == item) > 1)
{
Repeteditemlist.Add(item);
}
}
foreach (var item in Repeteditemlist)
{
Console.WriteLine(item);
}
Expected OutPut:
1 3 4
Comments
All the GroupBy answers are the simplest but won't be the most efficient. They're especially bad for memory performance as building large inner collections has allocation cost.
A decent alternative is HuBeZa's HashSet.Add based approach. It performs better.
If you don't care about nulls, something like this is the most efficient (both CPU and memory) as far as I can think:
public static IEnumerable<TProperty> Duplicates<TSource, TProperty>(
this IEnumerable<TSource> source,
Func<TSource, TProperty> duplicateSelector,
IEqualityComparer<TProperty> comparer = null)
{
comparer ??= EqualityComparer<TProperty>.Default;
Dictionary<TProperty, int> counts = new Dictionary<TProperty, int>(comparer);
foreach (var item in source)
{
TProperty property = duplicateSelector(item);
counts.TryGetValue(property, out int count);
switch (count)
{
case 0:
counts[property] = ++count;
break;
case 1:
counts[property] = ++count;
yield return property;
break;
}
}
}
The trick here is to avoid additional lookup costs once the duplicate count has reached 1. Of course you could keep updating the dictionary with count if you also want the number of duplicate occurrences for each item. For nulls, you just need some additional handling there, that's all.
Comments
In case anyone is interested, an easy and simple way without using LINQ would be:
struct MyData {
public int id;
public string name;
}
var myList = new List<MyData> {
new MyData { id = 1, name = "a" },
new MyData { id = 2, name = "b" },
new MyData { id = 1, name = "c" },
new MyData { id = 3, name = "d" },
new MyData { id = 2, name = "e" },
new MyData { id = 1, name = "f" }
};
// map it to a dictionaly with the key you want
// and increase the count when duplicate key found
var dic = new Dictionary<int, int>();
foreach (var item in myList) {
if (dic.ContainsKey(item.id))
dic[item.id]++;
else
dic.Add(item.id, 1);
}
// display result
foreach (var item in dic) {
Console.WriteLine($"itemId: {item.Key}, cound: {item.Value}");
}
Output:
itemId: 1, cound: 3
itemId: 2, cound: 2
itemId: 3, cound: 1
Comments
Remove duplicates by key
myTupleList = myTupleList.GroupBy(tuple => tuple.Item1).Select(group => group.First()).ToList();