1

I have a file which conatains amongst others SQL-CREATE-TABLE-commands. I want to write all SQL-CREATE-TABLE-commands into a list (not implemented yet), each command in a seperate list entry.

My problem is, that the regular expression does only return the first match, but there should be more.

Source file:

abcd
something
CREATE TABLE schema.test1(attribute1 DECIMAL(28, 7)  NULL , 
ATTRIBUTE2 DECIMAL(28, 7)  KEY  NOT NULL , 
ATTRIBUTE3 DECIMAL(28, 7)  NOT NULL , 
SET("db_alias_name" = 'TEST')
;

efgh
something else
CREATE TABLE schema.test2(attribute1 DECIMAL(28, 7)  NULL , 
ATTRIBUTE2 DECIMAL(28, 7)  KEY  NOT NULL , 
ATTRIBUTE3 DECIMAL(28, 7)  NOT NULL , 
SET("db_alias_name" = 'TEST')
;

something else
CREATE TABLE schema.test3(attribute1 DECIMAL(28, 7)  NULL , 
ATTRIBUTE2 DECIMAL(28, 7)  KEY  NOT NULL , 
ATTRIBUTE3 DECIMAL(28, 7)  NOT NULL , 
SET("db_alias_name" = 'TEST')
;
something else
12346
higkl

My script only returns the first match:

CREATE TABLE schema.test1(attribute1 DECIMAL(28, 7)  NULL , 
ATTRIBUTE2 DECIMAL(28, 7)  KEY  NOT NULL , 
ATTRIBUTE3 DECIMAL(28, 7)  NOT NULL , 
SET("db_alias_name" = 'TEST')

Script:

# -*- coding: utf-8 -*-
import os
import re

create_table_parts = []

atlfile = 'example.txt'
data = ''

def read_file(afile):
    with open(afile) as atl:
        text = atl.read()
        return text

data = read_file(atlfile)
data_utf8 = unicode(data, "utf-8")

round1 = re.search(r"(CREATE\sTABLE).+?(?=;)", data_utf8, re.MULTILINE|re.DOTALL)
print round1.group()

Could you maybe tell me, what's wrong here?

1

3 Answers 3

2

You'd be better off using finditer because it returns a match object like search:

someIter = re.finditer(r"(CREATE\sTABLE).+?(?=;)", data_utf8, re.MULTILINE|re.DOTALL)
for mObj in someIter:
    # process mObj
Sign up to request clarification or add additional context in comments.

Comments

1

You could use findall instead, see https://docs.python.org/2/library/re.html#re.findall

2 Comments

For some reason the postive lookahead isn't working with re.findall , it returns only the first part of the regex: print re.findall(r"(CREATE\sTABLE).+?(?=;)", data_utf8, re.MULTILINE|re.DOTALL) Returns: [u'CREATE TABLE', u'CREATE TABLE', u'CREATE TABLE']
Just use this regex instead 'CREATE\sTABLE.+?(?=;)'
0

Thanks to Mark's hint, below now a working example solution:

# -*- coding: utf-8 -*-
import os
import re

create_table_parts = []
atlfile = 'example.txt'
data = ''

def read_file(afile):
    with open(afile) as atl:
        text = atl.read()
        return text

data = read_file(atlfile)
data_utf8 = unicode(data, "utf-8")


def round1_get_CT(text):
    match_list = []
    someIter = re.finditer(r"(CREATE\sTABLE).+?(?=;)", text, re.MULTILINE|re.DOTALL)
    for mObj in someIter:
        #print mObj.group()
        match_list.append(mObj.group())
    return match_list

create_table_parts = round1_get_CT(data_utf8)

print "\n".join(create_table_parts)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.