3

I'm not really that good with regex, but I understand the basics. I'm trying to figure out how to do a conditional replace based upon a certain value in the match. For example:

Suppose I have some nested string structure that look like this:

"[id value]"//id and value are space delimited.  id will never have spaces

id is some string id that names the [] item and value is another nested [id value] item. Its possible for value to be empty, but I'm not worried about that for now.

If I have something like this:

A) "[vehicle [toyota camry]]"
or
B) "[animal [dog rufus]]"

I'd like to be able to call a certain function (ToString() for example) based upon id that gets output as the regex.Replace is executed from the inner most [] structure.

Going from example A pseudo code:

string Return = "{0}";
var 1stValueComboID = GetInteriorValue/IDFrom("[vehicle [toyota camry]]");
//1stValueComboID.ToString() = "Company: Toyota, Make: Camry"

Return = Format.String(Return,1stValueIDCombo.ToString());


var 2stValueComboID = GetSecondValue/IDFrom("[vehicle [toyota camry]]");
//2stValueComboID.ToString() = "Type: Vehicle, {0}"

Return = Format.String(Return,2ndValueIDCombo.ToString());

This sample obviously has nothing to do with regex, but it hopefully illustrates kind of what I'm trying to do.

7
  • Are these [] only two deep and never deeper? Commented Oct 12, 2010 at 1:40
  • No they can be infinitely deep, at least in theory. In practice, they are usually 5 to 6 maximum. This isn't really for a real application either, I'm more or less just messing around trying to learn regex and ran into this problem. Commented Oct 12, 2010 at 1:55
  • I think your example is the wrong way around, you want the 2nd one to be first, as it has the format string Commented Oct 12, 2010 at 1:56
  • Can you provide an example that's nested three or four deep? Commented Oct 12, 2010 at 1:59
  • 1
    @Shawn: if they are arbitrarily deep, you can't do this with regex. You'll need a parser. Commented Oct 12, 2010 at 2:02

2 Answers 2

2

Do I understand you correctly, that all strings you want to parse have the form

[id1 [id2 [id3 [id4 .. value]] ... ],

i.e. all brackets are closing at the end? Your question and examples seem to point that way. If thats true, parsing it using regex it not that difficult, depending on what you actually need your parser to do.

You could, say, use

static Tuple<String, String> Parse(String s)
{

    var match = Regex.Match(s, @"^\[(\w*) (.*)\]$", RegexOptions.None);
    return new Tuple<String, String>(match.Groups[1].ToString(), match.Groups[2].ToString());
}

That would result in

var result = Parse("[animal [dog rufus]]");
// result = {Item 1 = "animal", Item2 = "[dog rufus]" }
var inner = Parse(result.Item2);
// inner = { Item 1 = "dog", Item2 ="rufus"}

You could call Parse recursivly to get to the inner nesting levels.

Please ask if you have requirements I did not understand =)

Sign up to request clarification or add additional context in comments.

1 Comment

This will work only if the second element of the tuple is always what needs to be recursed. I have not verified it myself, but I'm quite sure that this will fail to parse "[[a b] [c d]]". That depends on the OP's grammar, of course.
1

JoshD correctly points out that this grammar you've proposed (having matching pairs of brackets) cannot be parsed using a regular expression. You need to construct a custom parser with recursive descent behavior.

9 Comments

It can be done with .NET, its just very ugly. See here: msdn.microsoft.com/en-us/library/…
That link looks like it directly addressing my question, but I don't have the regex skills to tailor it to my need. It sounds like a parser is the better approach anyway.
@James, regular expressions in a more theoretical/mathematical sense indeed only match/parse regular languages, but most modern day regular expression implementations can match/parse more than regular languages. I'm not even talking about recursive patterns, think 'back-references': (.).*?\1.
@James, note that I never said it would be a good idea to use regex for tasks like validating/parsing a language like (X)HTML. I simply said that you can't just say that something can't be done using a (modern day) regex-engine because the target language/string is not "regular". For example, if you ant to match the character that occurs at least 4 times in a string, you could do that using a regex like: (.)(?:.*?\1){4}, which matches cdbcbccaac from the target string abcdbcbccaacabdddd. But this "language" is, AFAIK, not regular (but suitable for a (modern-day) regex, IMO).
@James, ... and of course I agree with you that HTML and regex shouldn't belong in the same sentence (unless a 'not' or 'never' is present)! :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.