2

We are trying to use urls for complex querying and filtering.
I managed to get some of the simpler parst working using expression trees and a mix of regex and string manipulation but then we looked at a more complex string example

 var filterstring="(|(^(categoryid:eq:1,2,3,4)(categoryname:eq:condiments))(description:lk:”*and*”))";

I'd like to be able to parse this out in to parts but also allow it to be recursive.. I'd like to get the out put looking like:

   item[0] (^(categoryid:eq:1,2,3,4)(categoryname:eq:condiments)
   item[1] description:lk:”*and*”

From there I could Strip down the item[0] part to get

categoryid:eq:1,2,3,4
categoryname:eq:condiments

At the minute I'm using RegEx and strings to find the | ^ for knowing if it's an AND or an OR the RegEx matches brackets and works well for a single item it's when we nest the values that I'm struggling.

the Regex looks like

@"\((.*?)\)"

I need some way of using Regex to match the nested brackets and help would be appreciated.

6
  • 4
    I think the question is too complicated, it is not really easy to understand what the problem is. E.g. it might be the regex, it might be the ServiceStack, the URL or OData or something else. Try to explain it to a rubberduck. codinghorror.com/blog/2012/03/rubber-duck-problem-solving.html Commented Jul 11, 2013 at 9:39
  • 4
    I think it all comes down to matching nested brackets. I know that this is possible in PHP, perl and .NET. Otherwise, you might just write a small parser, it's not that complex. Commented Jul 11, 2013 at 9:43
  • I think HamZa is correct it's probably as simple as matching the brackets but it's the nesting that's causing me the problem. Commented Jul 11, 2013 at 9:46
  • 2
    @Andyroo It's a good question. There may be some improvements since the problem is just the brackets and there is superfluous information that has nothing to do with the core of the problem. That said, I'm pretty sure there is a duplicate on SO on how to match/parse nested brackets in C#, searching for it... Commented Jul 11, 2013 at 9:51
  • 1
    @Andyroo take a look at this answer. It seems promising. Commented Jul 11, 2013 at 10:13

3 Answers 3

2

You could transform the string into valid XML (just some simple replace, no validation):

var output = filterstring
    .Replace("(","<node>")
    .Replace(")","</node>")
    .Replace("|","<andNode/>")
    .Replace("^","<orNode/>");

Then, you could parse the XML nodes by using, for example, System.Xml.Linq.

XDocument doc = XDocument.Parse(output);

Based on you comment, here's how you rearrange the XML in order to get the wrapping you need:

foreach (var item in doc.Root.Descendants())
{
    if (item.Name == "orNode" || item.Name == "andNode")
    {
        item.ElementsAfterSelf()
            .ToList()
            .ForEach(x =>
            {
                x.Remove();
                item.Add(x);
            });
    }
}

Here's the resulting XML content:

<node>
  <andNode>
    <node>
      <orNode>
        <node>categoryid:eq:1,2,3,4</node>
        <node>categoryname:eq:condiments</node>
      </orNode>
    </node>
    <node>description:lk:”*and*”</node>
  </andNode>
</node>
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks Alex that was a good suggestion but I'm still missing the matching of the nested brackets. Might be able to modify it a bit though.
Could you elaborate on I'm still missing the matching of the nested brackets?
Due to the way th () is nested I get <nodeOr /> instead of the <nodeOr><node></node></nodeOr> wrapping I would need.
Well, after we have the XML, it's only manipulation. You can always check the first node in the child collection to see if: 1. It's an operator node (orNode or andNode) and apply the operation on it's siblings. Or you could get the siblings and make them the operator node's children. 2. If it's a non-operator node, continue parsing.
1

I understand that you want the values specified in the filterstring.

My solution would be something like this:

NameValueCollection values = new NameValueCollection();
foreach(Match pair in Regex.Matches(@"\((?<name>\w+):(?<operation>\w+):(?<value>[^)]*)\)"))
{
     if (pair.Groups["operation"].Value == "eq")
         values.Add(pair.Groups["name"].Value, pair.Groups["value"].Value);
}

The Regex understand a (name:operation:value), it doesn't care about all the other stuff.

After this code has run you can get the values like this:

values["categoryid"]
values["categoryname"]
values["description"]

I hope this will help you in your quest.

1 Comment

Thanks that is almost perfect I just need to tweak it so I can split the groupings up. For example I would need the catagoryId and name in one grouping as they are wrapped and the description in another grouping the | and ^ distinguishes whether it would be AND or OR so they would need retaining somehow to.
0

I think you should just make a proper parser for that — it would actually end up simpler, more extensible and save you time and headaches in the future. You can use any existing parser generator such as Irony or ANTLR.

2 Comments

Nice idea but I fear that may be way above my skill level.
@Andyroo it is not as hard as it seems (actually easier than complex regexes imo) and it is a valuable skill to have. but it is up to you of course.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.