I have a program that takes restricted SQL Server WHERE clauses and removes sectiona that are targeting a certian table. An example of such a where clause is
AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y')
I need to strip out all portions of the query that are using table Episode and take account of (, ) to enclose statements and also square braces for field names etc. So to do this I have
private string BuildResourceWhereClauses(string whereClauses, string episodeTable)
{
Regex r = new Regex(
$"AND\\s+\\(*\\[*{episodeTable}\\]*\\.\\[*\\w+\\]*\\s*(=|<>|<=|>=)(\\s*\\'*(NULL|\\S+|\\((.*?)\\)+)\\'*\\s*\\)*){{1}}",
RegexOptions.IgnoreCase);
string tmp = r.Replace(whereClauses, String.Empty).Trim();
return $" {tmp}";
}
This works well, returning
AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null)
But now, I have been ask to extend this so that we allow all of the SQL WHERE clause syntax. So we now could have a where clause like
AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y') AND (Episode.Paste = 'Y') AND [Episode].[Source] = '%6' AND [Episode].[TFC] NOT IN ('LWC', 'POD')
That we have to "parse", so I have amended the above method to
private string BuildResourceWhereClauses(string whereClauses, string episodeTable)
{
Regex r = new Regex(
$"AND\\s+\\(*\\[*{episodeTable}\\]*\\.\\[*\\w+\\]*\\s*(=|<>|<=|>=|LIKE|IN|NOT IN|IS|BETWEEN\\s+\\w+\\s+AND)(\\s*\\'*(NULL|\\S+|\\((.*?)\\)+)\\'*\\s*\\)*){{1}}",
RegexOptions.IgnoreCase);
string tmp = r.Replace(whereClauses, String.Empty).Trim();
return $" {tmp}";
}
using episodeTable = "Episode" I get returned
AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) 'POD')
This missing matches AND (Episode.Paste = 'Y'), AND [Episode].[Source] = '%6' and AND [Episode].[TFC] NOT IN ('LWC', 'POD').
What is wrong with the regex how can I amend it to return what I want?
Rather than make this regex anymore complex, can we simplify it?
Thanks for your time.
The answer below strips out some functionality I had in before (my fault for not stipulating that I needed to keep it! and also what makes this so hard - to capture all cases"). So I need to match this string
AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y') AND Episode.FRC BETWEEN 10 AND 20 AND Episode.Dt between '2011/02/25' and '2011/02/27' AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y' AND Episode.TFC IS NOT LIKE '655r%') AND (Episode.Paste = 'Y') AND [Episode].[Source] IS NOT LIKE '%6' AND [Episode].[TFC] NOT IN ('LWC', 'POD') AND [Episode].[TFC] IS NULL
So in C#, I need the following code
string whereClaues =
"AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) " +
"AND ([Episode].[YN] = 'Y') AND Episode.FRC BETWEEN 10 AND 20 AND Episode.Dt between '2011/02/25' and '2011/02/27' " +
"AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y' AND Episode.TFC IS NOT LIKE '655r%') " +
"AND (Episode.Paste = 'Y') AND [Episode].[Source] IS NOT LIKE '%6' AND [Episode].[TFC] NOT IN ('LWC', 'POD') AND [Episode].[TFC] IS NULL";
string tmp = r.Replace(whereClauses, String.Empty).Trim();
To give tmp as
AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null)
Stripping out all of the Episode clauses including BETWEEN statements and IS NOT NULL and IS NULL statements.
Is have
AND\s+\(*\[*Episode\]*\.\[*\w+\]*\s*(<>|[><]?=|(?:NOT\s+)?IN|(?:IS\s+)?LIKE|(?:IS\s+NOT\s+)?LIKE|BETWEEN(\s*\'*(\((.*?)\)+|NULL|\S+)\'*\s*\)*)AND)(\s*\'*(\((.*?)\)+|NULL|\S+)\'*\s*\)*)
But this is not matching
Episode.TFC IS NULL
What is wrong with the regex?You're using regular expressions to modify SQL, which is an awful hack. Why can't you just modify the SQL?Episodeparts of the where clause to use in the subsequent join query. As with all things like this, why I am using this method is not always clear. I am using a regex here because it seems like a convenient way to do what I want without writing a full parser - which would be a lot more work.