How to really split string into string arrays without losing its part in C#?

Question

What I have

string ImageRegPattern = @"http://[\w\.\/]*\.jpg|http://[\w\.\/]*\.png|http://[\w\.\/]*\.gif";
string a ="http://www.dsa.com/asd/jpg/good.jpgThis is a good dayhttp://www.a.com/b.pngWe are the Best friendshttp://www.c.com";

What I want

string[] s;
s[0] = "http://www.dsa.com/asd/jpg/good.jpg";
s[1] = "This is a good day";
s[2] = "http://www.a.com/b.png";
s[3] = "We are the Best friendshttp://www.c.com";

Bouns:
if the url can be splited like below, it will be better, but if not, that's ok.

s[3] = "We are the Best friends";
s[4] = "http://www.c.com";

What's the question
I try to use the code below to split the string,

string[] s= Regex.Split(sourceString, ImageRegPattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

But the result is not good, it seems that the Split method take out all the strings which matched the ImageRegPattern. But I want them to stay. I check the RegEx page on MSDN ,it seems there is no proper method to meet my need. So how to do it?

I don't think there is any general solution to split that string (sure you could craft some method to do it but it will be very specific). You get nothing back from the RegEx because it's splitting on matches. I personally would change the format of the string, unless there's a good reason not to you should just add a delimiter to the string. — evanmcdonnal
– evanmcdonnal, Commented May 29, 2013 at 18:43
Given a comma-separated list, Regex.Split("1,2,3", ",") will return the array ["1","2","3"]. The pattern you supply defines the separator, not what you want to keep. Regex.Split is not what you want to use here. You're trying to keep the text and the separators, which is not what Split does. — Jim Mischel
– Jim Mischel, Commented May 29, 2013 at 18:49

FishBasketGordo · Accepted Answer · 2013-05-29 19:25:27Z

You need something like this method, which finds all the matches first, and then collects them into a list along with the unmatched strings between them.

UPDATE: Added conditional to handle if no matches are found.

private static IEnumerable<string> InclusiveSplit
(
    string source, 
    string pattern
)
{
  List<string> parts = new List<string>();
  int currIndex = 0;

  // First, find all the matches. These are your separators.
  MatchCollection matches = 
      Regex.Matches(source, pattern, 
      RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

  // If there are no matches, there's nothing to split, so just return a
  // collection with just the source string in it.
  if (matches.Count < 1)
  {
    parts.Add(source);
  }
  else
  {
    foreach (Match match in matches)
    {
      // If the match begins after our current index, we need to add the
      // portion of the source string between the last match and the 
      // current match.
      if (match.Index > currIndex)
      {
        parts.Add(source.Substring(currIndex, match.Index - currIndex));
      }

      // Add the matched value, of course, to make the split inclusive.
      parts.Add(match.Value);

      // Update the current index so we know if the next match has an
      // unmatched substring before it.
      currIndex = match.Index + match.Length;
    }

    // Finally, check is there is a bit of unmatched string at the end of the 
    // source string.
    if (currIndex < source.Length)
      parts.Add(source.Substring(currIndex));
  }

  return parts;
}

The output for your example input will be like so:

[0] "http://www.dsa.com/asd/jpg/good.jpg"
[1] "This is a good day"
[2] "http://www.a.com/b.png"
[3] "We are the Best friendshttp://www.c.com"

HamZa · Accepted Answer · 2013-05-29 19:15:54Z

1

One does not simply underestimate the power of regex:

(.*?)([A-Z][\w\s]+(?=http|$))

Explanation:

(.*?) : group and match everything until capital letter found, in this group you'll find the url
( : start group
- [A-Z] : match one capital letter
- [\w\s]+ : match any character of a-z, A-Z, 0-9, _, \n, \r, \t, \f " " 1 or more times
- (?=http|$) : lookahead, check if what follows is http or end of line
- ) : close group (here you'll find the text)

Online demo

_{Note: This solution is for matching the string, not splitting it.}

answered May 29, 2013 at 19:15

HamZa

15k11 gold badges56 silver badges75 bronze badges

Comments

Dave Michener · Accepted Answer · 2013-05-29 18:59:53Z

0

I think you need a multi-step process to insert a delimiter that can then be used by the String.Split command:

resultString = Regex.Replace(rawString, @"(http://.*?/\w+\.(jpg|png|gif))", "|$1|", RegexOptions.IgnoreCase);
if (a.StartsWith("|")
   a = a.Substring(1);
string a = resultString.Split('|');

answered May 29, 2013 at 18:59

Dave Michener

1,10811 silver badges32 bronze badges

Comments

melwil · Accepted Answer · 2013-05-29 18:59:53Z

0

The obvious answer here is of course not to use split, but rather matching the image patterns and retrieving them. That being said, it's not impossible to use split.

string ImageRegPattern = @"(?=(http://[\w./]*?\.jpg|http://[\w./]*?\.png|http://[\w./]*?\.gif))|(?<=(\.jpg|\.png|\.gif))"

This will match any point in the string that is either followed by an image url, or a point that is preceeded by .jpg, .gif or .png.

I really don't recommend doing it this way, I'm just saying you can.

answered May 29, 2013 at 18:59

melwil

2,5531 gold badge21 silver badges34 bronze badges

Collectives™ on Stack Overflow

How to really split string into string arrays without losing its part in C#?

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related