0

In this regex I'm trying to extract the list of tables from a SQL select statement. It works fine with one exception, if the statement ends with the table name there is no match. For example:

Given he regex:

(?:from|join)\s+(.*?)(?:\)|\s+(?:where|on|inner|outer|full|left|right|join|\s*$))

and given this string:

"select xxxx  from table1 t1, table2, table3 t3 "  (note the last space)

The match is:

"table1 t1, table2, table3 t3"   

But given this string:

"select xxxx  from table1 t1, table2, table3 t3"  (without last space)

There's no match. How to make this work?

5
  • 2
    Wow, that's going to be tough. What about fully-qualified, escaped table names like [DataBase].[dbo].[TableName] (SQL Server) or `TableName` in MySQL (just for two examples your regex won't hit). Subqueries as well (select * from (select * from table)). I'm afraid regular expressions are not the right tool for this job. Commented May 15, 2015 at 20:29
  • I would rather parse it that try to match it (not saying it's not possible but this would need some work at least) Commented May 15, 2015 at 20:30
  • 2
    SQL is not a regular language so... parse it. Keep your sanity. Don't follow Cthulu. ;) Commented May 15, 2015 at 20:41
  • I just need the table names, do I still need to parse it? if yes, how to do that? Commented May 15, 2015 at 20:45
  • 1
    Can you guarantee that your input will stay simple? For instance, might you have to extract the tables from something of the form select count(*) from ( select foo.bar, baz.foo from foo, (select foo from herp) baz)? If so, matching just won't do it. Commented May 15, 2015 at 20:56

1 Answer 1

1

RegEx isn't very good at this, as it's a lot more complicated than it appears:

  • What if they use LEFT/RIGHT INNER/OUTER/CROSS/MERGE/NATURAL joins instead of the a,b syntax? The a,b syntax should be avoided anyway.
  • What about nested queries?
  • What if there is no table (selecting a constant)
  • What about line breaks and other whitespace formatting?
  • Alias names?

What you can do is an sql parser, and there is a good one Hrer.

See more answers in this post.

Sign up to request clarification or add additional context in comments.

4 Comments

Just to be clear - it is the second \s that is the problem, so |\s+(?:where|... should really become |\s*(?:where|...
exactly the second one
I cannot make that change as from tablewhere would become a match and it should not
@pgschr your'r right and after some searching i updated my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.