20

I know the concept of String.Split has been addressed before with a multitude of different approaches, but I am specifically interested in a LINQ solution to this question.

I've attempted to write an extension class to handle the split, but both attempts have some major issues. So for the following:

string s = "ABCDEFGHIJKLMNOPQRSTUVWX";
var results = s.SplitEvery(4);

I would want a list like: { "ABCD", "EFGH", "IJKL", "MNOP", "QRST", "UVWX" }

Here is my extension class:

public static class Extensions
{
    public static List<string> SplitEvery(this string s, int n)
    {
        List<string> list = new List<string>();

        var Attempt1 = s.Select((c, i) => i % n== 0 ? s.Substring(i, n) : "|").Where(x => x != "|").ToList();

        var Attempt2 = s.Where((c, i) => i % n== 0).Select((c, i) => s.Substring(i, n)).ToList();

        return list;
    }
}

Attempt 1 inserts a dummy string "|" every time the condition isn't met, then removes all instances of the dummy string to create the final list. It works, but creating the bad strings seems like an unnecessary extra step. Furthermore, this attempt fails if the string isn't evenly divisible by n.

Attempt 2 was me trying to select only substrings where the index was divisible by N, but the 'i' value in the Select statement doesn't correspond to the 'i' value in the Where statement, so I get results like: { "ABCD", "BCDE", etc... }

I feel like I'm close to a good solution, but could use a helpful nudge in the right direction. Any suggestions?

[Edit]

I ended up going with a combination of suggestions to handle my string-splitter. It might not be the fastest, but as a newbie to LINQ, this implementation was the most succinct and easy for me to understand.

public static List<string> SplitEvery(this string s, int size)
{
    return s.Select((x, i) => i)
        .Where(i => i % size == 0)
        .Select(i => String.Concat(s.Skip(i).Take(size))).ToList();
}

Thanks for all the excellent suggestions.

3
  • Side note: it would be nice to specify what is your "better" criteria. I.e. in this case it seem to be "query readable by novice LINQ user that matches description as close as possible, prefer Enumerable methods over all performance considerations". In this lite Concat with Take indeed would look like a best approach. Commented Aug 9, 2013 at 17:01
  • My apologies, that is a fair assessment. I was mainly interested in a clean, one-liner approach similar to my original attempts above. In my case, readability was more important to me than scalability. Hopefully no one will try to dump an enormous text file into my string. :) Commented Aug 9, 2013 at 17:27
  • 1
    (My comment above is pure suggestion - nothing to apologize for). One more random note to watch out for in LINQ - your final approach iterates sequence multiple times. It is fine for string, but not going to work for "one-time" sequences like result of SQL query or File.ReadAllLines. There are several answers (i.e. with yield return) that demonstrate approaches that iterate collection once. Commented Aug 9, 2013 at 18:31

8 Answers 8

27
string s = "ABCDEFGHIJKLMNOPQRSTUVWX";
var results = s.Select((c, i) => new { c, i })
            .GroupBy(x => x.i / 4)
            .Select(g => String.Join("",g.Select(y=>y.c)))
            .ToList();

You can also use morelinq's batch

var res = s.Batch(4).Select(x => String.Join("", x)).ToList();

If you don't mind using side effects, this is possible too

var res2 = s.SplitEvery(4).ToList();

public static IEnumerable<string> SplitEvery(this string s, int n)
{
    int index = 0;
    return s.GroupBy(_=> index++/n).Select(g => new string(g.ToArray()));
}

And Of course every string operation question deserves a Regex answer :)

var res3 = Regex.Split(s, @"(?<=\G.{4})");
Sign up to request clarification or add additional context in comments.

1 Comment

Think that regex solution should be at the top of this answer, as it's faster (from my tests) and shorter then other solutions here.
12

Here is another solution:

var result = s.Select((x, i) => i)
              .Where(i => i % 4 == 0)
              .Select(i => s.Substring(i, s.Length - i >= 4 ? 4 : s.Length - i));

1 Comment

YES. That's exactly what I was trying to get out of there. For me, this was the most straight-forward and readable option, since my knowledge of LINQ is pretty limited. Thanks so much!
10

You can use this extension method, which implemented with simple substring getting (I believe it is faster, than enumerating over characters and joining them into strings):

public static IEnumerable<string> SplitEvery(this string s, int length)
{
    int index = 0;
    while (index + length < s.Length)
    {
        yield return s.Substring(index, length);
        index += length;                
    }

    if (index < s.Length)
        yield return s.Substring(index, s.Length - index);
}

Comments

6
public static IEnumerable<string> SplitEvery(this string s, int length)
{
    return s.Where((c, index) => index % length == 0)
           .Select((c, index) => String.Concat(
                s.Skip(index * length).Take(length)
             )
           );
}

The jury is out on whether new String(chars.ToArray()) would be faster or slower for this than String.Concat(chars).

You may of course append a .ToList() to return a List rather than IEnumerable.

3 Comments

I was worried the final .Take(length) would throw an out of index error, but it looks like that is all handled inside the method. Great solution!
Yeah, there's a bit of readability-impaired trickery which goes to show that it was late when I wrote this... I.e., the result of the Where call (= the char at each split index) is never used directly - it's only there to limit the number of results the following Select should return. The only Exception Take should ever throw is, as far as I recall, if the source you invoke it on is null. The rest of the time, it does The Sensible Thing.
... in other words, s.Where could be replaced with Enumerable.Range(0, x), where x would be the calculated number of split indices. See e.g. @AlexeiLevenkov's answer. That would more clearly communicate the intent.
4

Substring should be fine to select 4-character portions of the string. You just need to be careful with last portion:

new Func<string, int, IEnumerable<string>>(
        (string s, int n) => 
           Enumerable.Range(0, (s.Length + n-1)/n)
           .Select(i => s.Substring(i*n, Math.Min(n, s.Length - i*n)))) 
("ABCDEFGHIJKLMNOPQRSTUVWX", 4)

Note: if this answer is converted into operation on generic enumerable it will have to iterate collection multiple times (Count() and Substring converted to Skip(i*n).Take(n)).

Comments

3

This seems to work:

public static IEnumerable<string> SplitEvery(this string s, int n) {
    var enumerators = Enumerable.Repeat(s.GetEnumerator(), n);
    while (true) {
        var chunk = string.Concat(enumerators
            .Where(e => e.MoveNext())
            .Select(e => e.Current));
        if (chunk == "") yield break;
        yield return chunk;
    }
}

Comments

1

Here's a couple of LINQy ways of doing it:

public static IEnumerable<string> SplitEvery( this IEnumerable<char> s , int n )
{
  StringBuilder sb = new StringBuilder(n) ;
  foreach ( char c in s )
  {
    if ( sb.Length == n )
    {
      yield return sb.ToString() ;
      sb.Length = 0 ;
    }
    sb.Append(c) ;
  }
}

Or

public static IEnumerable<string> SplitEvery( this string s , int n )
{
  int limit = s.Length - ( s.Length % n ) ;
  int i = 0 ;

  while ( i < limit )
  {
    yield return s.Substring(i,n) ;
    i+=n ;
  }

  if ( i < s.Length )
  {
    yield return s.Substring(i) ;
  }

}

6 Comments

Curious how they're "LINQy"?
In order to be LINQy, you should use LINQ.
They're LINQ extension methods. You might want to read up on how to extend LINQ
I'm not going to get in a pi$$ing match with you. By Microsoft's definition, at the very least the first is an extension to LINQ. Have a nice day and thannks for playing.
This digs into the core stuff of LINQ, it's kind of custom LINQ or advanced LINQ, I don't think a newbie can understand this kind of job. Calling it LINQy is not really bad. +1 for that man.
|
1

This also works, but requires 'unwrapping' an IGrouping<x,y>:

public static IEnumerable<String> Split(this String me,int SIZE) {
  //Works by mapping the character index to a 'modulo Staircase'
  //and then grouping by that 'stair step' value
  return me.Select((c, i) => new {
    step = i - i % SIZE,
    letter = c.ToString()
  })
  .GroupBy(kvp => kvp.step)
  .Select(grouping => grouping
    .Select(g => g.letter)
    .Aggregate((a, b) => a + b)
  );
}

EDIT: Using LINQ's lazy evaluation mechanisms (yield return) you can also achieve this using recursion

public static IEnumerable<String> Split(this String me, int SIZE) {      
  if (me.Length > SIZE) {
    var head = me.Substring(0,SIZE);
    var tail = me.Substring(SIZE,me.Length-SIZE);
    yield return head;        
    foreach (var item in tail.Split(SIZE)) {
      yield return item; 
    }
  } else { 
    yield return me;
  }
}

Although, personally, I stay away from Substring because it encourages state-ful code (counters, indexes, etc. in the parent or global scopes).

1 Comment

Reading the answers this method is nearly identical to the first answer from @I4V , except without either the flooring-integer-devision or the empty-string-join.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.