expression tree instead of using LINQ

Question

I have a LIST<T> where T:IComparable<T>

I want to write a List<T> GetFirstNElements (IList<T> list, int n) where T :IComparable <T> which returns the first n distinct largest elements ( the list can have dupes) using expression trees.

Why not use LINQ? And how exactly would expression trees help solve this problem? — Cameron
– Cameron, Commented Oct 2, 2011 at 20:08
@Elena: Why do you think expression trees would make things faster? When you say you're afraid it is inefficient, do you have evidence that it's too slow for what you want? — Jon Skeet
– Jon Skeet, Commented Oct 2, 2011 at 20:16
@Elena even with that clarification, I see no correlation to the use of expression-trees here... — Marc Gravell
– Marc Gravell, Commented Oct 2, 2011 at 20:18
@Merlyn - not really; imagine the data "1,2,3,4,5,6,7"; distinct is "1,2,3,4,5,6,7"; take n=3 is "1,2,3"; sort (desc) is "3,2,1" - now, what happened to 7, 6, 5 and 4? — Marc Gravell
– Marc Gravell, Commented Oct 2, 2011 at 20:46

Marc Gravell · Accepted Answer · 2011-10-02 20:39:40Z

3

In some performance-critical code I wrote recently, I had a very similar requirement - the candidate set was very large, and the number needed very small. To avoid sorting the entire candidate set, I use a custom extension method that simply keeps the n largest items found so far in a linked list. Then I can simply:

loop once over the candidates
if I haven't yet found "n" items, or the current item is better than the worst already selected, then add it (at the correct position) in the linked-list (inserts are cheap here)
- if we now have more than "n" selected, drop the worst (deletes are cheap here)

then we are done; at the end of this, the linked-list contains the best "n" items, already sorted. No need to use expression-trees, and no "sort a huge list" overhead. Something like:

public static IEnumerable<T> TakeTopDistinct<T>(this IEnumerable<T> source, int count)
{
    if (source == null) throw new ArgumentNullException("source");
    if (count < 0) throw new ArgumentOutOfRangeException("count");
    if (count == 0) yield break;

    var comparer = Comparer<T>.Default;
    LinkedList<T> selected = new LinkedList<T>();

    foreach(var value in source)
    {
        if(selected.Count < count // need to fill
            || comparer.Compare(selected.Last.Value, value) < 0 // better candidate
            )
        {
            var tmp = selected.First;
            bool add = true;
            while (tmp != null)
            {
                var delta = comparer.Compare(tmp.Value, value);
                if (delta == 0)
                {
                    add = false; // not distinct
                    break;
                }
                else if (delta < 0)
                {
                    selected.AddBefore(tmp, value);
                    add = false;
                    if(selected.Count > count) selected.RemoveLast();
                    break;
                }
                tmp = tmp.Next;
            }
            if (add && selected.Count < count) selected.AddLast(value);
        }
    }
    foreach (var value in selected) yield return value;
}

edited Oct 2, 2011 at 20:39

answered Oct 2, 2011 at 20:22

Marc Gravell

1.1m273 gold badges2.6k silver badges3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Merlyn Morgan-Graham Over a year ago

Will this be any logically different than or guaranteed better perf than: list.Distinct().Take(n).OrderBy(t => t)?

Marc Gravell Over a year ago

@Merlyn yes; firstly it will be correct - if you Take and then OrderBy, you are only sorting the first n items that are in the sequence, which is nothing like getting the top n items. Second, sorting is expensive (for a large list) - at O(m*log(m)) (for size of list m) the above should be much closer to O(m) for small n. It is, however, a little bit vulnerable to pathological edge-cases - but note that it is also space efficient; it doesn't require much working space.

Merlyn Morgan-Graham Over a year ago

+1; I get the difference between sort-than-take vs take-than-sort. I was getting the impression the OP was trying to avoid an O(N) operation. Seeing that this isn't the case, I think I get that your solution is still O(N), but is space efficient, and with a low N and good candidate list, might still be time efficient.

Marc Gravell Over a year ago

@Merlyn a regular OrderBy is O(m*log(m)) note, so getting down to O(m) is a good thing

Merlyn Morgan-Graham Over a year ago

Now I'm being vague :) I believed it could be nearly O(n) instead of O(m) or greater. It can't, so your implementation seems ace (besides proof of pathological issues causing a high amortized cost, which I am in no position to provide, and I have no reason yet to believe. You're already handling the pre-sorted cases, so no worries there).

oberfreak · Accepted Answer · 2011-10-02 20:11:10Z

0

If i get the question right, you just want to sort the entries in the list.
Wouldn't it be possible for you to implement the IComparable and use the "Sort" Method of the List?
The code in "IComparable" can handle the tree compare and everything you want to use to compare and sort so you just can use the Sort mechnism at this point.

List<T> GetFirstNElements (IList<T> list, int n) where T :IComparable <T>{
    list.Sort();
    List<T> returnList = new List<T>();
    for(int i = 0; i<n; i++){
        returnList.Add(list[i]);
    }
    return returnList;
}

Wouldn't be the fastest code ;-)

answered Oct 2, 2011 at 20:11

oberfreak

1,81515 silver badges20 bronze badges

Comments

Corey Kosak · Accepted Answer · 2011-10-02 21:10:04Z

0

The standard algorithm for doing so, which takes expected time O(list.Length) is in Wikipedia as "quickfindFirstK" on this page:

http://en.wikipedia.org/wiki/Selection_algorithm#Selecting_k_smallest_or_largest_elements

This improves on @Marc Gravell's answer because the expected running time of it is linear in the length of the input list, regardless of the value of n.

answered Oct 2, 2011 at 21:10

Corey Kosak

2,63319 silver badges13 bronze badges

Collectives™ on Stack Overflow

expression tree instead of using LINQ

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related