Nested Regex Replace in C#

Question

I'm not really that good with regex, but I understand the basics. I'm trying to figure out how to do a conditional replace based upon a certain value in the match. For example:

Suppose I have some nested string structure that look like this:

"[id value]"//id and value are space delimited.  id will never have spaces

id is some string id that names the [] item and value is another nested [id value] item. Its possible for value to be empty, but I'm not worried about that for now.

If I have something like this:

A) "[vehicle [toyota camry]]"
or
B) "[animal [dog rufus]]"

I'd like to be able to call a certain function (ToString() for example) based upon id that gets output as the regex.Replace is executed from the inner most [] structure.

Going from example A pseudo code:

string Return = "{0}";
var 1stValueComboID = GetInteriorValue/IDFrom("[vehicle [toyota camry]]");
//1stValueComboID.ToString() = "Company: Toyota, Make: Camry"

Return = Format.String(Return,1stValueIDCombo.ToString());


var 2stValueComboID = GetSecondValue/IDFrom("[vehicle [toyota camry]]");
//2stValueComboID.ToString() = "Type: Vehicle, {0}"

Return = Format.String(Return,2ndValueIDCombo.ToString());

This sample obviously has nothing to do with regex, but it hopefully illustrates kind of what I'm trying to do.

No they can be infinitely deep, at least in theory. In practice, they are usually 5 to 6 maximum. This isn't really for a real application either, I'm more or less just messing around trying to learn regex and ran into this problem. — Shawn
– Shawn, Commented Oct 12, 2010 at 1:55
I think your example is the wrong way around, you want the 2nd one to be first, as it has the format string — Luke Schafer
– Luke Schafer, Commented Oct 12, 2010 at 1:56
Can you provide an example that's nested three or four deep? — BillP3rd
– BillP3rd, Commented Oct 12, 2010 at 1:59
@Shawn: if they are arbitrarily deep, you can't do this with regex. You'll need a parser. — JoshD
– JoshD, Commented Oct 12, 2010 at 2:02

Jens · Accepted Answer · 2010-10-13 07:09:58Z

2

Do I understand you correctly, that all strings you want to parse have the form

[id1 [id2 [id3 [id4 .. value]] ... ],

i.e. all brackets are closing at the end? Your question and examples seem to point that way. If thats true, parsing it using regex it not that difficult, depending on what you actually need your parser to do.

You could, say, use

static Tuple<String, String> Parse(String s)
{

    var match = Regex.Match(s, @"^\[(\w*) (.*)\]$", RegexOptions.None);
    return new Tuple<String, String>(match.Groups[1].ToString(), match.Groups[2].ToString());
}

That would result in

var result = Parse("[animal [dog rufus]]");
// result = {Item 1 = "animal", Item2 = "[dog rufus]" }
var inner = Parse(result.Item2);
// inner = { Item 1 = "dog", Item2 ="rufus"}

You could call Parse recursivly to get to the inner nesting levels.

Please ask if you have requirements I did not understand =)

answered Oct 13, 2010 at 7:09

Jens

25.7k9 gold badges80 silver badges120 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

James Dunne Over a year ago

This will work only if the second element of the tuple is always what needs to be recursed. I have not verified it myself, but I'm quite sure that this will fail to parse "[[a b] [c d]]". That depends on the OP's grammar, of course.

James Dunne · Accepted Answer · 2010-10-12 03:59:57Z

1

JoshD correctly points out that this grammar you've proposed (having matching pairs of brackets) cannot be parsed using a regular expression. You need to construct a custom parser with recursive descent behavior.

answered Oct 12, 2010 at 3:59

James Dunne

3,6574 gold badges27 silver badges30 bronze badges

9 Comments

Jens Over a year ago

It can be done with .NET, its just very ugly. See here: msdn.microsoft.com/en-us/library/…

Shawn Over a year ago

That link looks like it directly addressing my question, but I don't have the regex skills to tailor it to my need. It sounds like a parser is the better approach anyway.

Bart Kiers Over a year ago

@James, regular expressions in a more theoretical/mathematical sense indeed only match/parse regular languages, but most modern day regular expression implementations can match/parse more than regular languages. I'm not even talking about recursive patterns, think 'back-references': (.).*?\1.

Bart Kiers Over a year ago

@James, note that I never said it would be a good idea to use regex for tasks like validating/parsing a language like (X)HTML. I simply said that you can't just say that something can't be done using a (modern day) regex-engine because the target language/string is not "regular". For example, if you ant to match the character that occurs at least 4 times in a string, you could do that using a regex like: (.)(?:.*?\1){4}, which matches cdbcbccaac from the target string abcdbcbccaacabdddd. But this "language" is, AFAIK, not regular (but suitable for a (modern-day) regex, IMO).

Bart Kiers Over a year ago

@James, ... and of course I agree with you that HTML and regex shouldn't belong in the same sentence (unless a 'not' or 'never' is present)! :)

|

Collectives™ on Stack Overflow

Nested Regex Replace in C#

2 Answers 2

1 Comment

9 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

9 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related