3

I've a CSV file with table names and primary keys for those tables in below format:

|  Table Name  |  Primary Key  | 
|    Table 1   |     Col1      |  
|    Table 1   |     Col2      |
|    Table 1   |     Col3      | 
|    Table 2   |     Col11     | 
|    Table 2   |     Col12     | 

I want to run a sql query to validate PK constraint for every table. The query to do it would look like this:

select Col1, Col2, Col3 from Table1
group by Col1, Col2, Col3
having count(*)>1 

But I've thousands of table in this file. How would I write and execute this query dynamically and write results into a flat file? I want to execute this using Python 3.

Attempt:

CSV:

enter image description here

My PKTest.py

def getColumns(filename):
    tables = {}

    with open(filename) as f:
        for line in f:
            line = line.strip()
            if 'Primary Key' in line:
                continue

            cols = line.split('|')
            table = cols[1].strip()
            col = cols[2].strip()

            if table in tables:
                tables[table].append(col)
            else:
                tables[table] = [col]
    return tables

def runSQL(table, columns):
    statement = 'select {0} from {1} group by {0} having count(*) > 1'.format(', '.join(columns), table.replace(' ',''))
    return statement

if __name__ == '__main__':
    tables = getColumuns('PKTest.csv')
    try:
        #cursor to connect

        for table in tables:
            sql = runSQL(table,tables[table])
            print(sql)
            cursor.execute(sql)
            for result in cursor:
                print(result)

    finally:
        cursor.close()
    ctx.close()

1 Answer 1

2

You will have to improvise on this answer a bit since I do not have access to Oracle.

Let's assume there's a file called so.csv that contains the data as shown in your question.

Create a file called so.py like so. I'll add bits of code and some explanation. You can piece the file together or copy/paste it from here: https://rextester.com/JLQ73751.

At the top of the file, import your Oracle dependency:

# import cx_Oracle
# https://www.oracle.com/technetwork/articles/dsl/python-091105.html

Then, create a function that parses your so.csv and puts table and columns in a dictionary like this: {'Table 1': ['Col1', 'Col2', 'Col3'], 'Table 2': ['Col11', 'Col12']}

def get_tables_columns(filename):

    tables = {}

    with open(filename) as f:
        for line in f:
            line = line.strip()
            if 'Primary Key' in line:
                continue

            cols = line.split('|')

            table = cols[1].strip()
            col = cols[2].strip()

            if table in tables:
                tables[table].append(col)
            else:
                tables[table] = [col]

    return tables

Then, create a function that generates sql if it knows the table and list of columns:

def get_sql(table, columns):

    statement = 'select {0} from {1} group by {0} having count(*) > 1'.format(
            ', '.join(columns),
            table.replace(' ', '')
        )

    return statement

It's time to execute the functions:

if __name__ == '__main__':
    tables = get_tables_columns('so.csv')

    # here goes your code to connect with Oracle
    # con = cx_Oracle.connect('pythonhol/[email protected]/orcl')
    # cur = con.cursor()

    for table in tables:
        sql = get_sql(table, tables[table])
        print(sql)

        # here goes your sql statement execution            
        # cur.execute(sql)
        # for result in cur:
        #    print result

    # close your Oracle connection
    # con.close()

You can include your Oracle-related statements and run the python file.

Sign up to request clarification or add additional context in comments.

6 Comments

this looks good. I've tested it yet. How about writing this to a flat file? Appending?
Yeah, you can just execute this in its current form like so: python3 so.py > outputfile.sql. That'll give you outputfile.sql.
I am getting below error: Traceback (most recent call last): File "PKTest.py", line 28, in <module> tables = getColumuns('PK Test.csv') File "PKTest.py", line 14, in getColumuns table = cols[1].strip() IndexError: list index out of range
Also did you add cols = line.split('|') based on the table above? Don't think its needed if we're reading from CSV
Yes, splitting is done based on | from your example. If you have had a CSV, it'd be best to publish an example in your question. You can adapt the split for CSV. What does your PKTest.py look like? You might want to include that in your edited question and I'll be happy to help.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.