0

I have a LIST<T> where T:IComparable<T>

I want to write a List<T> GetFirstNElements (IList<T> list, int n) where T :IComparable <T> which returns the first n distinct largest elements ( the list can have dupes) using expression trees.

9
  • 5
    Why not use LINQ? And how exactly would expression trees help solve this problem? Commented Oct 2, 2011 at 20:08
  • Why? What's wrong with LINQ? Use the tool that fits. Commented Oct 2, 2011 at 20:09
  • 2
    @Elena: Why do you think expression trees would make things faster? When you say you're afraid it is inefficient, do you have evidence that it's too slow for what you want? Commented Oct 2, 2011 at 20:16
  • 3
    @Elena even with that clarification, I see no correlation to the use of expression-trees here... Commented Oct 2, 2011 at 20:18
  • 1
    @Merlyn - not really; imagine the data "1,2,3,4,5,6,7"; distinct is "1,2,3,4,5,6,7"; take n=3 is "1,2,3"; sort (desc) is "3,2,1" - now, what happened to 7, 6, 5 and 4? Commented Oct 2, 2011 at 20:46

3 Answers 3

3

In some performance-critical code I wrote recently, I had a very similar requirement - the candidate set was very large, and the number needed very small. To avoid sorting the entire candidate set, I use a custom extension method that simply keeps the n largest items found so far in a linked list. Then I can simply:

  • loop once over the candidates
  • if I haven't yet found "n" items, or the current item is better than the worst already selected, then add it (at the correct position) in the linked-list (inserts are cheap here)
    • if we now have more than "n" selected, drop the worst (deletes are cheap here)

then we are done; at the end of this, the linked-list contains the best "n" items, already sorted. No need to use expression-trees, and no "sort a huge list" overhead. Something like:

public static IEnumerable<T> TakeTopDistinct<T>(this IEnumerable<T> source, int count)
{
    if (source == null) throw new ArgumentNullException("source");
    if (count < 0) throw new ArgumentOutOfRangeException("count");
    if (count == 0) yield break;

    var comparer = Comparer<T>.Default;
    LinkedList<T> selected = new LinkedList<T>();

    foreach(var value in source)
    {
        if(selected.Count < count // need to fill
            || comparer.Compare(selected.Last.Value, value) < 0 // better candidate
            )
        {
            var tmp = selected.First;
            bool add = true;
            while (tmp != null)
            {
                var delta = comparer.Compare(tmp.Value, value);
                if (delta == 0)
                {
                    add = false; // not distinct
                    break;
                }
                else if (delta < 0)
                {
                    selected.AddBefore(tmp, value);
                    add = false;
                    if(selected.Count > count) selected.RemoveLast();
                    break;
                }
                tmp = tmp.Next;
            }
            if (add && selected.Count < count) selected.AddLast(value);
        }
    }
    foreach (var value in selected) yield return value;
}
Sign up to request clarification or add additional context in comments.

5 Comments

Will this be any logically different than or guaranteed better perf than: list.Distinct().Take(n).OrderBy(t => t)?
@Merlyn yes; firstly it will be correct - if you Take and then OrderBy, you are only sorting the first n items that are in the sequence, which is nothing like getting the top n items. Second, sorting is expensive (for a large list) - at O(m*log(m)) (for size of list m) the above should be much closer to O(m) for small n. It is, however, a little bit vulnerable to pathological edge-cases - but note that it is also space efficient; it doesn't require much working space.
+1; I get the difference between sort-than-take vs take-than-sort. I was getting the impression the OP was trying to avoid an O(N) operation. Seeing that this isn't the case, I think I get that your solution is still O(N), but is space efficient, and with a low N and good candidate list, might still be time efficient.
@Merlyn a regular OrderBy is O(m*log(m)) note, so getting down to O(m) is a good thing
Now I'm being vague :) I believed it could be nearly O(n) instead of O(m) or greater. It can't, so your implementation seems ace (besides proof of pathological issues causing a high amortized cost, which I am in no position to provide, and I have no reason yet to believe. You're already handling the pre-sorted cases, so no worries there).
0

If i get the question right, you just want to sort the entries in the list.
Wouldn't it be possible for you to implement the IComparable and use the "Sort" Method of the List?
The code in "IComparable" can handle the tree compare and everything you want to use to compare and sort so you just can use the Sort mechnism at this point.

List<T> GetFirstNElements (IList<T> list, int n) where T :IComparable <T>{
    list.Sort();
    List<T> returnList = new List<T>();
    for(int i = 0; i<n; i++){
        returnList.Add(list[i]);
    }
    return returnList;
}

Wouldn't be the fastest code ;-)

Comments

0

The standard algorithm for doing so, which takes expected time O(list.Length) is in Wikipedia as "quickfindFirstK" on this page:

http://en.wikipedia.org/wiki/Selection_algorithm#Selecting_k_smallest_or_largest_elements

This improves on @Marc Gravell's answer because the expected running time of it is linear in the length of the input list, regardless of the value of n.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.