5

I have a String Array x and a List y and I want to remove all data from Y from the List X, how to do that in the fastest way?

e.g.: X: 1) "aaa.bbb.ccc" 2) "ddd.eee.fff" 3) "ggg.hhh.jjj"

Y: 1) "bbb" 2) "fff"

Result should be a new List in Which only 3) exist because X.1 gets deleted by Y.1 and X.2 gets deleted by Y.2

How to do that?

I know I could do a foreach on the List X and check with everything in List Y, bit is that the fastest way?

4
  • 2
    Do you mean that you want to remove from x all elements which contain as substrings any of the elements of y? Also: You say "Array", do you mean "List"? Commented Oct 23, 2013 at 12:21
  • 1
    Should X1 also be removed if Y1 was only "bb"? Commented Oct 23, 2013 at 12:35
  • Yes, it should be cross references. Commented Oct 23, 2013 at 12:38
  • 1
    @Kovu Do you mean "Yes, if the thing to remove is "bb" then you should remove the "aaa.bbb.ccc" item"? (Even though "bb" is only a subset of "bbb") Commented Oct 23, 2013 at 12:46

5 Answers 5

9

The most convenient would be

var Z = X.Where(x => !x.Split('.').Intersect(Y).Any()).ToList();

That is not the same as "fastest". Probably the fastest (runtime) way to do that is to use a token search, like:

public static bool ContainsToken(string value, string token, char delimiter = '.')
{
    if (string.IsNullOrEmpty(token)) return false;
    if (string.IsNullOrEmpty(value)) return false;

    int lastIndex = -1, idx, endIndex = value.Length - token.Length, tokenLength = token.Length;
    while ((idx = value.IndexOf(token, lastIndex + 1)) > lastIndex)
    {
        lastIndex = idx;
        if ((idx == 0 || (value[idx - 1] == delimiter))
            && (idx == endIndex || (value[idx + tokenLength] == delimiter)))
        {
            return true;
        }
    }
    return false;
}

then something like:

var list = new List<string>(X.Length);
foreach(var x in X)
{
    bool found = false;
    foreach(var y in Y)
    {
        if(ContainsToken(x, y, '.'))
        {
            found = true;
            break;
        }
    }
    if (!found) list.Add(x);
}

This:

  • doesn't allocate arrays (for the output of Split, of for the params char[] of Split)
  • doesn't create any new string instances (for the output of Split)
  • doesn't use delegate abstraction
  • doesn't have captured scopes
  • uses the struct custom iterator of List<T> rather than the class iterator of IEnumerable<T>
  • starts the new List<T> with the appropriate worst-case size to avoid reallocations
Sign up to request clarification or add additional context in comments.

8 Comments

@DeeMac see edit, which avoids things like allocations due to Split
+1 then. Interesting to see your code on token search, I haven't seen that before.
@DeeMac it is actually taken from some stackoverflow.com code I was writing yesterday to replace some code that was looking for matches in the form "abc;def;ghij" - the old code was doing a Split, and we were seeing lots of overhead from repeated strings (and arrays) slowly filling memory - i.e. every request would cause an extra "abc", "def", "ghij", and a new string[3]. On stackoverflow.com, that fills up quickly...
Given your ContainsToken() then I think you can use it to remove all the matching items from the list like so: x.RemoveAll(s1 => y.Any(s2 => ContainsToken(s1, s2, '.'))); (if you want the original list to be modified)
@MatthewWatson well, the question stated "a new list"; also, I was trying to avoid any invisible allocations (such as a capture context) - but yes: that's a nice usage of RemoveAll
|
1

Iterating over X and Y would indeed be the fastest option because you have this Contains constraint. I really don't see any other way.

It should not be a foreach over X though, because you cannot modify the collection you iterate over with foreach.

So an option would be:

for (int counterX = 0; counterX < X.Length; counterX++)
{
    for(int counterY = 0; counterY < Y.Length; counterY++)
    {
        if (X[counterX].Contains(Y[counterY]))
        {
            X.RemoveAt(counterX--);
            counterY = Y.Length;
        }
    }
}

This should do it (mind you, this code is not tested).

1 Comment

I've proposed the same answer but got down-voted!? +1 for your answer, it's what I agree would be the best approach.
1

I think that a fairly fast approach would be to use List's built-in RemoveAll() method:

List<string> x = new List<string>
{
    "aaa.bbb.ccc",
    "ddd.eee.fff",
    "ggg.hhh.jjj"
};

List<string> y = new List<string>
{
    "bbb",
    "fff"
};

x.RemoveAll(s => y.Any(s.Contains));

(Note that I am assuming that you have two lists, x and y. Your OP mentions a string array but then goes on to talk about "List X" and "List Y", so I'm ignoring the string array bit.)

2 Comments

Contains here is unreliable because "aaa.bbbb.ccc" contains "bbb", but I wouldn't consider that a "match"
@MarcGravell The OP is ambiguous in this regard. As you can see, I asked for clarification.
1

Try this, using Aggregate function

    var xArr = new string[] { "aaa.bbb.ccc", "ddd.eee.fff", "ggg.hhh.jjj" };
    var yList = new List<string> { "bbb", "fff" };

    var result = xArr.Aggregate(new List<string> { }, (acc, next) =>
    {
        var elems = next.Split('.');
        foreach (var y in yList)
            if (elems.Contains(y))
                return acc;
        acc.Add(next);
        return acc;
    });

2 Comments

That's a whole lot of Splits... if the aim is convenience, it can be done in one line; if the aim is performance, then: there are better approaches
@MarcGravell, thanks, I improved the answer for doing just one split per iteration.
0

If you've got a relatively small list the performance ramifications wouldn't really be a big deal. This is the simplest foreach solution I could come up with.

List<string> ListZ = ListX.ToList();

foreach (string x in ListX)
{
    foreach (string y in ListY)
    {
        if (x.Contains(y))
            ListZ.Remove(x);
    }
}

1 Comment

This gets a little tricky - if Y has "bbb", does that cause "aaa.bbbbb.ccc", to be removed? well, it would - but should it? (this is perhaps more a question for the OP)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.