8

I'd like to do a Regex.Split on some separators but I'd like to keep the separators. To give an example of what I'm trying:

"abc[s1]def[s2][s3]ghi" --> "abc", "[s1]", "def", "[s2]", "[s3]", "ghi"

The regular expression I've come up with is new Regex("\\[|\\]|\\]\\["). However, this gives me the following:

"abc[s1]def[s2][s3]ghi" --> "abc", "s1", "def", "s2", "", "s3", "ghi"

The separators have disappeared (which makes sense given my regex). Is there a way to write the regex so that the separators themselves are preserved?

2 Answers 2

12

Use zero-length maching lookarounds; you want to split on

(?=\[)|(?<=\])

That is, anywhere where we assert a match of a literal [ ahead, or where we assert a match of literal ] behind.

As a C# string literal, this is

@"(?=\[)|(?<=\])"

See also

Related questions


Example in Java

    System.out.println(java.util.Arrays.toString(
        "abc[s1]def[s2][s3]ghi".split("(?=\\[)|(?<=\\])")
    ));
    // prints "[abc, [s1], def, [s2], [s3], ghi]"

    System.out.println(java.util.Arrays.toString(
        "abc;def;ghi;".split("(?<=;)")
    ));
    // prints "[abc;, def;, ghi;]"

    System.out.println(java.util.Arrays.toString(
        "OhMyGod".split("(?=(?!^)[A-Z])")
    ));
    // prints "[Oh, My, God]"
Sign up to request clarification or add additional context in comments.

Comments

1

You could use .Matches instead of .Split, example (http://www.ideone.com/gUjRM):

string x = "abc[s1]def[s2][s3]ghi";
var r = new Regex(@"[^\[]+|\[[^\]]+\]");
var ms = r.Matches(x);
// do stuff with the MatchCollection `ms`.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.