2

I have such text:

((#) This is text

    ((#) This is subtext 

        ((#) This is sub-subtext #)

    #)

 #)

I made following regex:

        var counter = 0;
        return Regex.Replace(text,
             @"\(\(#\)(.*?)#\)",
             m =>
             {
                var str = m.ToString();
                counter++;
                return counter + ") " + str.Replace("((#)", "").Replace("#)", "")
             });

So the result I expected would be like

1) This is text
   2) This is subtext
       3) This is sub-subtext

I know that this will not work properly, because regex will take #) from the second ((#) and so on.

How to avoid this conflict? Thanks! :)

5
  • If you change the regex to @"\(\(#\)(.*)" you will partly get the output you need, it will still have #)s. Are you looking to obtain nested substrings? Commented Dec 16, 2015 at 9:37
  • Yes, it must be nested substrings. Commented Dec 16, 2015 at 9:39
  • Possible duplicate of Can regular expressions be used to match nested patterns? Commented Dec 16, 2015 at 9:39
  • @OndrejSvejdar: No, it is not since the accepted answer is not appropriate for .NET. Podeig, the problem here is that you cannot do it within one single operation. 1) Get the nested strings, 2) replace in a loop. Commented Dec 16, 2015 at 9:39
  • @stribizhev Could you provide an example? Commented Dec 16, 2015 at 9:46

2 Answers 2

1

Here is the solution I suggest:

  • Get the nested strings with the regex featuring balanced groups,
  • Replace the substrings in a loop.

See the regex demo here. It matches empty strings but also captures all nested substrings that start with ((#) and end with #).

Here is C# demo code:

var text = @"((#) This is text

    ((#) This is subtext 

        ((#) This is sub-subtext #)

     #)

#)";
var chunks = Regex.Matches(text,
            @"(?s)(?=(\(\(#\)(?>(?!\(\(#\)|#\)).|\(\(#\)(?<D>)|#\)(?<-D>))*(?(D)(?!))#\)))")
               .Cast<Match>().Select(p => p.Groups[1].Value)
               .ToList();
for (var i = 0; i < chunks.Count; i++)
     text = text.Replace(chunks[i], string.Format("{0}) {1}", (i+1), 
                         chunks[i].Substring(4, chunks[i].Length-6).Trim()));

Note that .Substring(4, chunks[i].Length-6) just gets a substring from ((#) up to #). Since we know the delimiters, we can hardcode these values.

Output:

enter image description here

To learn more about balancing groups, see Balancing Groups Definition and Fun With .NET Regex Balancing Groups.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! You brought me much closer to the final solution! I need to read more about balancing group definitions :)
0

I believe this to be impossible, because your grammar is at its core recursive:

TEXT := "((#)" TEXT "#)"

Which is something that cannot be consumed by a regular expression, because it can only handle languages created by regular grammar.

In that sense, the question linked by Ondrej actually does answer your problem, just not how you want it.

The only way you can handle this with regular expressions is by limiting yourself to a definitive depth of recursion and match everything up to this depth, which I think is not what you want.

To make this work for any number of nesting levels, you will have no other choice (that I know of) than using a parser for context-free languages.

3 Comments

Which is something that cannot be consumed by a regular expression is totally wrong in the context of .NET regular expressions. See Balancing Groups Definition and Fun With .NET Regex Balancing Groups.
@stribizhev I was not aware of that feature, thanks for the hint.
There is a recursion support for some other regex flavors, see regular-expressions.info Regular Expression Recursion page.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.