I want to extract the column names from this SQL statement using Python Regex. It can have different delimeters, such as [, CASE, (] the pattern is that we can extract columns that contain either capital letters, small letters, underscores or numbers. It must start with a letter A-Za-z.
SELECT
year_period
,month
,replace(source, '\n', '') source Data_source
,CASE
WHEN left(r, 2) = 'XL'
THEN r
ELSE CONCAT (
'XL'
,r
)
END reg
,desc
,cc
,(
CASE
WHEN right(full, 1) = 'I'
THEN left(full, length(full) - 1)
ELSE full
END
) full
FROM schema1.table1
Output:
[year, month, Data_source, reg, desc, cc, full]
Our attempt with regex Python:
with open('SQL_Statement.txt') as f:
f = f.read()
print(f)
pattern = r'T|\,|\s{1,}([A-Za-z]{1,}[A-Za-z0-9_]{1,})\s{1,}'
import re
re.findall(pattern, f)
sql-metadatagithub.com/macbre/sql-metadata, check if it is of use