1

I have a program that takes restricted SQL Server WHERE clauses and removes sectiona that are targeting a certian table. An example of such a where clause is

AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y')

I need to strip out all portions of the query that are using table Episode and take account of (, ) to enclose statements and also square braces for field names etc. So to do this I have

private string BuildResourceWhereClauses(string whereClauses, string episodeTable)
{
    Regex r = new Regex(
        $"AND\\s+\\(*\\[*{episodeTable}\\]*\\.\\[*\\w+\\]*\\s*(=|<>|<=|>=)(\\s*\\'*(NULL|\\S+|\\((.*?)\\)+)\\'*\\s*\\)*){{1}}",
        RegexOptions.IgnoreCase);

    string tmp = r.Replace(whereClauses, String.Empty).Trim();
    return $" {tmp}";
}

This works well, returning

AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null)

But now, I have been ask to extend this so that we allow all of the SQL WHERE clause syntax. So we now could have a where clause like

AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y') AND (Episode.Paste = 'Y') AND [Episode].[Source] = '%6' AND [Episode].[TFC] NOT IN ('LWC', 'POD')

That we have to "parse", so I have amended the above method to

private string BuildResourceWhereClauses(string whereClauses, string episodeTable)
{
    Regex r = new Regex(
        $"AND\\s+\\(*\\[*{episodeTable}\\]*\\.\\[*\\w+\\]*\\s*(=|<>|<=|>=|LIKE|IN|NOT IN|IS|BETWEEN\\s+\\w+\\s+AND)(\\s*\\'*(NULL|\\S+|\\((.*?)\\)+)\\'*\\s*\\)*){{1}}",
        RegexOptions.IgnoreCase);

    string tmp = r.Replace(whereClauses, String.Empty).Trim();
    return $" {tmp}";
}

using episodeTable = "Episode" I get returned

AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) 'POD')

This missing matches AND (Episode.Paste = 'Y'), AND [Episode].[Source] = '%6' and AND [Episode].[TFC] NOT IN ('LWC', 'POD').

  1. What is wrong with the regex how can I amend it to return what I want?

  2. Rather than make this regex anymore complex, can we simplify it?

Thanks for your time.


The answer below strips out some functionality I had in before (my fault for not stipulating that I needed to keep it! and also what makes this so hard - to capture all cases"). So I need to match this string

AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y') AND Episode.FRC BETWEEN 10 AND 20 AND Episode.Dt between '2011/02/25' and '2011/02/27' AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y' AND Episode.TFC IS NOT LIKE '655r%') AND (Episode.Paste = 'Y') AND [Episode].[Source] IS NOT LIKE '%6' AND [Episode].[TFC] NOT IN ('LWC', 'POD') AND [Episode].[TFC] IS NULL

So in C#, I need the following code

string whereClaues = 
    "AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) " + 
    "AND ([Episode].[YN] = 'Y') AND Episode.FRC BETWEEN 10 AND 20 AND Episode.Dt between '2011/02/25' and '2011/02/27' " +
    "AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y' AND Episode.TFC IS NOT LIKE '655r%') " +
    "AND (Episode.Paste = 'Y') AND [Episode].[Source] IS NOT LIKE '%6' AND [Episode].[TFC] NOT IN ('LWC', 'POD') AND [Episode].[TFC] IS NULL";
string tmp = r.Replace(whereClauses, String.Empty).Trim();

To give tmp as

AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null)

Stripping out all of the Episode clauses including BETWEEN statements and IS NOT NULL and IS NULL statements.

Is have

AND\s+\(*\[*Episode\]*\.\[*\w+\]*\s*(<>|[><]?=|(?:NOT\s+)?IN|(?:IS\s+)?LIKE|(?:IS\s+NOT\s+)?LIKE|BETWEEN(\s*\'*(\((.*?)\)+|NULL|\S+)\'*\s*\)*)AND)(\s*\'*(\((.*?)\)+|NULL|\S+)\'*\s*\)*)

But this is not matching

Episode.TFC IS NULL

6
  • What is wrong with the regex? You're using regular expressions to modify SQL, which is an awful hack. Why can't you just modify the SQL? Commented May 1, 2018 at 13:33
  • The SQL comes in from user input. This where clause is used in one CTE query to create a tmp table, which is subsequently joined with another. I need to strip out the Episode parts of the where clause to use in the subsequent join query. As with all things like this, why I am using this method is not always clear. I am using a regex here because it seems like a convenient way to do what I want without writing a full parser - which would be a lot more work. Commented May 1, 2018 at 13:36
  • Try this one Commented May 1, 2018 at 14:00
  • @WiktorStribiżew I like that, please make a brief answer and I will accept. I think this might help someone else in the future. Commented May 1, 2018 at 14:13
  • Posted with explanations. Commented May 1, 2018 at 14:27

1 Answer 1

1

It seems you may extend your pattern in the following way:

$@"AND\s+\(*\[*{episodeTable}\]*\.\[*\w+\]*\s*(<>|[><]?=|(?:NOT\s+)?IN)(\s*\'*(\((.*?)\)+|NULL|\S+)\'*\s*\)*)"

See the regex demo here.

Details

  • AND - a substring
  • \s+ - 1+ whitespaces
  • \(* - 0+ ( chars
  • \[* - 0+ [ chars
  • Episode - name of the table
  • \]* - 0+ ] chars
  • \. - a . char
  • \[* - 0+ [ chars
  • \w+ - 1+ word chars
  • \]* - 0+ ] chars
  • \s* - 0+ whitespaces
  • (<>|[><]?=|(?:NOT\s+)?IN) - Group 1: <>, <=, >=, =, NOT IN or IN
  • (\s*\'*(\((.*?)\)+|NULL|\S+)\'*\s*\)*) - Group 2:
    • \s* - 0+ whitespace chars
    • \'* - 0+ ' chars
    • (\((.*?)\)+|NULL|\S+) - Group 3:
      • \( - a (
      • (.*?) - Group 4: any 0+ chars other than newline as few as possible
      • \)+ - 1+ ) chars
      • | - or
      • NULL - a NULL substring
      • | - or
      • \S+ - 1+ non-whitespace chars
    • \'* - 0+ ' chars
    • \s* - 0+ whitespaces
    • \)* - 0+ ) chars.
Sign up to request clarification or add additional context in comments.

1 Comment

Pain in the rear, but I forgot parts like AND [Episode].[Source] IS LIKE '%6' and AND [Episode].[Source] IS NOT LIKE '%6'. Also, in group 3 we might have NOT NULL can you advise on my update AND\s+\(*\[*Episode\]*\.\[*\w+\]*\s*(<>|[><]?=|(?:NOT\s+)?IN|(?:IS\s+)?LIKE)(\s*\'*(\((.*?)\)+|IS\s+(?:NOT\s+)?\s+NULL|\S+)\'*\s*\)*) my attempt at adding support for IS NULL and IS NOT NULL does not work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.