0

I want to extract the column names from this SQL statement using Python Regex. It can have different delimeters, such as [, CASE, (] the pattern is that we can extract columns that contain either capital letters, small letters, underscores or numbers. It must start with a letter A-Za-z.

SELECT 
    year_period
    ,month
    ,replace(source, '\n', '') source Data_source
    ,CASE 
        WHEN left(r, 2) = 'XL'
            THEN r
        ELSE CONCAT (
                'XL'
                ,r
                )
        END reg
    ,desc
    ,cc
    ,(
        CASE 
            WHEN right(full, 1) = 'I'
                THEN left(full, length(full) - 1)
            ELSE full
            END
        ) full

FROM schema1.table1

Output:


[year, month, Data_source, reg, desc, cc, full]

Our attempt with regex Python:


with open('SQL_Statement.txt') as f:
    f = f.read()
    
print(f)

pattern = r'T|\,|\s{1,}([A-Za-z]{1,}[A-Za-z0-9_]{1,})\s{1,}'

import re

re.findall(pattern, f)


2

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.