3

I want to optimize this code for counting the number of occurrences in a list of strings. To be specific, I have two lists

1) cat: a huge list of string with duplicates (duplicates must exist).

2) cat_unq: the distinct elements from cat.

What I am currently doing in my code is looping all unique elements in cat_unq and counting how many times the unique element exists in the list of duplicates. The search runs on a mobile device.

I already tried switching to arrays from list but the performance was slightly better and not sufficient.

Another try was using parallel search using foreach parallel but the performance was not stable.

Here is the code I am currently using :

private List<int> GetCategoryCount(List<string> cat, List<string> cat_unq)
{
    List<int> cat_count = new List<int>();
    for (int i = 0; i < cat_unq.Count; i++)
        cat_count.Add(cat.Where(x => x.Equals(cat_unq[i])).Count());
    return cat_count;
}
3
  • Does cat_unq have all the unique values from cat or a subset? If the former you can just do a grouping on cat and get the count of each occurance. And even if it's the latter it would be better to get those counts in one pass of the cat list and then use it to get your counts in the desired order. Commented Sep 13, 2019 at 11:12
  • @TimSchmelter That wouldn't help much here since they iterate all the values in cat_unq and then iterate the values in cat. Commented Sep 13, 2019 at 11:14
  • @TimSchmelter It depends. Maybe cat_unq represents some type of desired order. But really it almost sounds like they might be getting the distinct values up front and then doing this which is even more work than is needed. Commented Sep 13, 2019 at 11:16

1 Answer 1

5

It is slow because you are searching the entire cat array for every unique name. (cat.Where(....).Count()).

Instead, group your cat list with duplicates, and make it a dictionary. Then you can quickly find the number of each unique name.

private List<int> GetCategoryCount(List<string> cat, List<string> cat_unq)
{
    var catsDict = cat.GroupBy(x => x).ToDictionary(k => k.Key, v => v.Count());
    return cat_unq.Select(c => catsDict[c]).ToList();
}

Note that if you are elsewhere forming your uniqe list of cat names its pointless, you can do that all together in the above (The dictionary has the unique cat names as keys)

// No need for a separate list of unique names
private List<int> GetCategoryCount(List<string> cat)
{
    return cat.GroupBy(x => x).Select(g => g.Count()).ToList();
}

or maybe what you actually wanted was a list back of all the unique names and the counts

// No need for a separate list of unique names - as this one returns it with the counts in a dictionary
private Dictionary<string,int> GetCategoryCount(List<string> cat)
{
    return cat.GroupBy(x => x).ToDictionary((k => k.Key, v => v.Count());
}
Sign up to request clarification or add additional context in comments.

2 Comments

One thing to note that this will be reasonably fast when categories in cat and can_unq are same-ey size. if cat_unq contains significantly less categories than cat_unq, then this might do lots of unecessary grouping. But I would recommend profiling for this kind of scenario.
@Euphoric yes (oh, and thanks for the edit!). Although we'd need to have a lot of cats for the extra cycles in grouping to make a difference.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.