4

What I have

string ImageRegPattern = @"http://[\w\.\/]*\.jpg|http://[\w\.\/]*\.png|http://[\w\.\/]*\.gif";
string a ="http://www.dsa.com/asd/jpg/good.jpgThis is a good dayhttp://www.a.com/b.pngWe are the Best friendshttp://www.c.com";

What I want

string[] s;
s[0] = "http://www.dsa.com/asd/jpg/good.jpg";
s[1] = "This is a good day";
s[2] = "http://www.a.com/b.png";
s[3] = "We are the Best friendshttp://www.c.com";

Bouns:
if the url can be splited like below, it will be better, but if not, that's ok.

s[3] = "We are the Best friends";
s[4] = "http://www.c.com";

What's the question
I try to use the code below to split the string,

string[] s= Regex.Split(sourceString, ImageRegPattern, RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

But the result is not good, it seems that the Split method take out all the strings which matched the ImageRegPattern. But I want them to stay. I check the RegEx page on MSDN ,it seems there is no proper method to meet my need. So how to do it?

2
  • 1
    I don't think there is any general solution to split that string (sure you could craft some method to do it but it will be very specific). You get nothing back from the RegEx because it's splitting on matches. I personally would change the format of the string, unless there's a good reason not to you should just add a delimiter to the string. Commented May 29, 2013 at 18:43
  • 2
    Given a comma-separated list, Regex.Split("1,2,3", ",") will return the array ["1","2","3"]. The pattern you supply defines the separator, not what you want to keep. Regex.Split is not what you want to use here. You're trying to keep the text and the separators, which is not what Split does. Commented May 29, 2013 at 18:49

4 Answers 4

4

You need something like this method, which finds all the matches first, and then collects them into a list along with the unmatched strings between them.

UPDATE: Added conditional to handle if no matches are found.

private static IEnumerable<string> InclusiveSplit
(
    string source, 
    string pattern
)
{
  List<string> parts = new List<string>();
  int currIndex = 0;

  // First, find all the matches. These are your separators.
  MatchCollection matches = 
      Regex.Matches(source, pattern, 
      RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

  // If there are no matches, there's nothing to split, so just return a
  // collection with just the source string in it.
  if (matches.Count < 1)
  {
    parts.Add(source);
  }
  else
  {
    foreach (Match match in matches)
    {
      // If the match begins after our current index, we need to add the
      // portion of the source string between the last match and the 
      // current match.
      if (match.Index > currIndex)
      {
        parts.Add(source.Substring(currIndex, match.Index - currIndex));
      }

      // Add the matched value, of course, to make the split inclusive.
      parts.Add(match.Value);

      // Update the current index so we know if the next match has an
      // unmatched substring before it.
      currIndex = match.Index + match.Length;
    }

    // Finally, check is there is a bit of unmatched string at the end of the 
    // source string.
    if (currIndex < source.Length)
      parts.Add(source.Substring(currIndex));
  }

  return parts;
}

The output for your example input will be like so:

[0] "http://www.dsa.com/asd/jpg/good.jpg"
[1] "This is a good day"
[2] "http://www.a.com/b.png"
[3] "We are the Best friendshttp://www.c.com"
Sign up to request clarification or add additional context in comments.

Comments

1

One does not simply underestimate the power of :

(.*?)([A-Z][\w\s]+(?=http|$))

Explanation:

  • (.*?) : group and match everything until capital letter found, in this group you'll find the url
  • ( : start group
    • [A-Z] : match one capital letter
    • [\w\s]+ : match any character of a-z, A-Z, 0-9, _, \n, \r, \t, \f " " 1 or more times
    • (?=http|$) : lookahead, check if what follows is http or end of line
    • ) : close group (here you'll find the text)

Online demo

Note: This solution is for matching the string, not splitting it.

Comments

0

I think you need a multi-step process to insert a delimiter that can then be used by the String.Split command:

resultString = Regex.Replace(rawString, @"(http://.*?/\w+\.(jpg|png|gif))", "|$1|", RegexOptions.IgnoreCase);
if (a.StartsWith("|")
   a = a.Substring(1);
string a = resultString.Split('|');

Comments

0

The obvious answer here is of course not to use split, but rather matching the image patterns and retrieving them. That being said, it's not impossible to use split.

string ImageRegPattern = @"(?=(http://[\w./]*?\.jpg|http://[\w./]*?\.png|http://[\w./]*?\.gif))|(?<=(\.jpg|\.png|\.gif))"

This will match any point in the string that is either followed by an image url, or a point that is preceeded by .jpg, .gif or .png.

I really don't recommend doing it this way, I'm just saying you can.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.