1

I have postgres tables and i want to run a PostgreSQL script file on these tables using python and then write the result of the queries in a csv file. The script file have multiple queries separated by semicolon ;. Sample script is shown below

Script file:

--Duplication Check
select p.*, c.name
from scale_polygons_v3 c inner join cartographic_v3 p
on (metaphone(c.name_displ, 20) LIKE metaphone(p.name, 20)) AND c.kind NOT IN (9,10)
where ST_Contains(c.geom, p.geom);

--Area Check
select sp.areaid,sp.name_displ,p.road_id,p.name
from scale_polygons_v3 sp, pak_roads_20162207 p
where st_contains(sp.geom,p.geom) and sp.kind = 1
and p.areaid != sp.areaid;

When i run the python code, it executes successfully without any error but the problem i am facing is, during writing the result of the queries to a csv file. Only the result of last executed query is written to the csv file. It means that first query result is overwrite by the second query, second by third and so on till the last query.

Here is my python code:

import psycopg2
import sys
import csv
import datetime, time

def run_sql_file(filename, connection):
'''
    The function takes a filename and a connection as input
    and will run the SQL query on the given connection  
'''
    start = time.time()

    file = open(filename, 'r')
    sql = s = " ".join(file.readlines())
   #sql = sql1[3:]
    print "Start executing: " + " at " + str(datetime.datetime.now().strftime("%Y-%m-%d %H:%M")) + "\n"
    print "Query:\n", sql + "\n"
    cursor = connection.cursor()
    cursor.execute(sql)
    records = cursor.fetchall()
    with open('Report.csv', 'a') as f:
        writer = csv.writer(f, delimiter=',')
        for row in records:
            writer.writerow(row)
    connection.commit()
    end = time.time()
    row_count = sum(1 for row in records)
    print "Done Executing:", filename
    print "Number of rows returned:", row_count
    print "Time elapsed to run the query:",str((end - start)*1000) + ' ms'
    print "\t ==============================="

def main():    
    connection = psycopg2.connect("host='localhost' dbname='central' user='postgres' password='tpltrakker'")
    run_sql_file("script.sql", connection)
    connection.close()

if __name__ == "__main__":
    main()

What is wrong with my code?

12
  • probably not helpful, but the code looks good to me. You are opening the file with mode 'a' as is done here Commented Aug 25, 2016 at 10:57
  • Executing a script in the single call you'll get the result only for the last command executed (or error if something goes wrong). Commented Aug 25, 2016 at 11:42
  • All the queries have the same number of columns and are they of the same type (and order)? Commented Aug 25, 2016 at 11:46
  • @ClodoaldoNeto No the queries have different number of columns and have different type i.e one table is lines and another is polygons data Commented Aug 25, 2016 at 11:49
  • 1
    There is a lot of suggestions: single query per file; parse the file for every single command; ... My suggestion is to use cursors or prepared statements instead of the simple queries and then fetch the results using its names. Commented Aug 25, 2016 at 11:57

2 Answers 2

1

If you are able to change the SQL script a bit then here is a workaround:

#!/usr/bin/env python

import psycopg2

script = '''
    declare cur1 cursor for
        select * from (values(1,2),(3,4)) as t(x,y);

    declare cur2 cursor for
        select 'a','b','c';
    '''
print script

conn = psycopg2.connect('');

# Cursors exists and available only inside the transaction
conn.autocommit = False;

# Create cursors from script
conn.cursor().execute(script);

# Read names of cursors
cursors = conn.cursor();
cursors.execute('select name from pg_cursors;')
cur_names = cursors.fetchall()

# Read data from each available cursor
for cname in cur_names:
    print cname[0]
    cur = conn.cursor()
    cur.execute('fetch all from ' + cname[0])
    rows = cur.fetchall()
    # Here you can save the data to the file
    print rows


conn.rollback()

print 'done'

Disclaimer: I am totally newbie with Python.

Sign up to request clarification or add additional context in comments.

Comments

1

This is the simplest to output each query as a different file. copy_expert

query = '''
    select p.*, c.name
    from
        scale_polygons_v3 c
        inner join
        cartographic_v3 p on metaphone(c.name_displ, 20) LIKE metaphone(p.name, 20) and c.kind not in (9,10)
    where ST_Contains(c.geom, p.geom)
'''
copy = "copy ({}) to stdout (format csv)".format(query)
f = open('Report.csv', 'wb')
cursor.copy_expert(copy, f, size=8192)
f.close()

query = '''
    select sp.areaid,sp.name_displ,p.road_id,p.name
    from scale_polygons_v3 sp, pak_roads_20162207 p
    where st_contains(sp.geom,p.geom) and sp.kind = 1 and p.areaid != sp.areaid;
'''
copy = "copy ({}) to stdout (format csv)".format(query)
f = open('Report2.csv', 'wb')
cursor.copy_expert(copy, f, size=8192)
f.close()

If you want to append the second output to the same file then just keep the first file object opened.

Notice that it is necessary that copy outputs to stdout to make it available to copy_expert

3 Comments

This worked for me Ok, but i don't understand the code as i have to do some further modification
@ShahzadBacha You mean why copy to stdout? I updated the answer. Otherwise what part you do not understand?
Actually this should be a separate question but anyway how can i modify the code, also to count the total number of rows returned from the query and only write limited rows like LIMIT 10 at the end of the query

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.