psycopg2/python copy data from postgresql to Amazon RedShift(postgresql)

Question

I would like to know community opinion regarding the "best way to copy data from PostgreSQL to RedShift with python 2.7.x". I can't use Amazon S3 and RedShift is normal postgresql database but support copy only from S3(I can't use)

I have to perform all manipulations in parallel. It's not supported to execute more than one copy command each time. Anyway it's mostly python knowledge question. — Yuri Levinsky
– Yuri Levinsky, Commented May 6, 2015 at 8:51
The Redshift docs say they support COPY "... from files on Amazon S3, from a DynamoDB table, or from text output from one or more remote hosts" so you should be able to load in COPY data via python/psycopg2 as you wish. — Josh Kupershmidt
– Josh Kupershmidt, Commented May 6, 2015 at 13:29
Josh, please see my previous comment. Technically it's possible but requirement says 10 parallel operations that impossible than I use copy. — Yuri Levinsky
– Yuri Levinsky, Commented May 7, 2015 at 14:24

Alex B · Accepted Answer · 2018-06-25 20:27:40Z

You can use Python/psycopg2/boto to code it end-to-end. Alternative to psycopg2 is PosgtreSQL client (psql.exe).

If you use psycopg2 you can:

Spool to a file from PostgreSQL
Upload to S3
Append to Redshift table.

If you use psql.exe you can:

Pipe data to S3 multipart uploader from PostgreSQL

in_qry=open(opt.pgres_query_file, "r").read().strip().strip(';')
db_client_dbshell=r'%s\bin\psql.exe' % PGRES_CLIENT_HOME.strip('"')
loadConf=[ db_client_dbshell ,'-U', opt.pgres_user,'-d',opt.pgres_db_name, '-h', opt.pgres_db_server]

q="""
COPY ((%s) %s)
TO STDOUT
WITH DELIMITER ','
CSV %s
""" % (in_qry, limit, quote)
#print q
p1 = Popen(['echo', q], stdout=PIPE,stderr=PIPE,env=env)

p2 = Popen(loadConf, stdin=p1.stdout, stdout=PIPE,stderr=PIPE)

p1.wait()
return p2

Upload to S3.

Append to Redshift table using psycopg2.

fn='s3://%s' % location
conn_string = REDSHIFT_CONNECT_STRING.strip().strip('"')    
con = psycopg2.connect(conn_string);
cur = con.cursor(); 
quote=''
if opt.red_quote:
    quote='quote \'%s\'' % opt.red_quote
ignoreheader =''
if opt.red_ignoreheader:
    ignoreheader='IGNOREHEADER %s' % opt.red_ignoreheader
timeformat=''
if opt.red_timeformat:
    #timeformat=" dateformat 'auto' "
    timeformat=" TIMEFORMAT '%s'" %     opt.red_timeformat.strip().strip("'")
sql="""
COPY %s FROM '%s' 
CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s' 
DELIMITER '%s' 
FORMAT CSV %s 
GZIP 
%s 
%s; 
COMMIT;
""" % (opt.red_to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,opt.red_col_delim,quote, timeformat, ignoreheader)
cur.execute(sql)    
con.close()

I did my best compiling all 3 steps into one script.

Thanks for your answer, but your script link is broken. Could you reshare it please?

Collectives™ on Stack Overflow

psycopg2/python copy data from postgresql to Amazon RedShift(postgresql)

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related