I would like to know community opinion regarding the "best way to copy data from PostgreSQL to RedShift with python 2.7.x". I can't use Amazon S3 and RedShift is normal postgresql database but support copy only from S3(I can't use)
4
-
... and you can't use S3 because ...?Craig Ringer– Craig Ringer2015-05-06 08:36:37 +00:00Commented May 6, 2015 at 8:36
-
I have to perform all manipulations in parallel. It's not supported to execute more than one copy command each time. Anyway it's mostly python knowledge question.Yuri Levinsky– Yuri Levinsky2015-05-06 08:51:49 +00:00Commented May 6, 2015 at 8:51
-
The Redshift docs say they support COPY "... from files on Amazon S3, from a DynamoDB table, or from text output from one or more remote hosts" so you should be able to load in COPY data via python/psycopg2 as you wish.Josh Kupershmidt– Josh Kupershmidt2015-05-06 13:29:45 +00:00Commented May 6, 2015 at 13:29
-
Josh, please see my previous comment. Technically it's possible but requirement says 10 parallel operations that impossible than I use copy.Yuri Levinsky– Yuri Levinsky2015-05-07 14:24:52 +00:00Commented May 7, 2015 at 14:24
Add a comment
|
1 Answer
You can use Python/psycopg2/boto to code it end-to-end. Alternative to psycopg2 is PosgtreSQL client (psql.exe).
If you use psycopg2 you can:
- Spool to a file from PostgreSQL
- Upload to S3
- Append to Redshift table.
If you use psql.exe you can:
Pipe data to S3 multipart uploader from PostgreSQL
in_qry=open(opt.pgres_query_file, "r").read().strip().strip(';') db_client_dbshell=r'%s\bin\psql.exe' % PGRES_CLIENT_HOME.strip('"') loadConf=[ db_client_dbshell ,'-U', opt.pgres_user,'-d',opt.pgres_db_name, '-h', opt.pgres_db_server] q=""" COPY ((%s) %s) TO STDOUT WITH DELIMITER ',' CSV %s """ % (in_qry, limit, quote) #print q p1 = Popen(['echo', q], stdout=PIPE,stderr=PIPE,env=env) p2 = Popen(loadConf, stdin=p1.stdout, stdout=PIPE,stderr=PIPE) p1.wait() return p2Upload to S3.
Append to Redshift table using psycopg2.
fn='s3://%s' % location conn_string = REDSHIFT_CONNECT_STRING.strip().strip('"') con = psycopg2.connect(conn_string); cur = con.cursor(); quote='' if opt.red_quote: quote='quote \'%s\'' % opt.red_quote ignoreheader ='' if opt.red_ignoreheader: ignoreheader='IGNOREHEADER %s' % opt.red_ignoreheader timeformat='' if opt.red_timeformat: #timeformat=" dateformat 'auto' " timeformat=" TIMEFORMAT '%s'" % opt.red_timeformat.strip().strip("'") sql=""" COPY %s FROM '%s' CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s' DELIMITER '%s' FORMAT CSV %s GZIP %s %s; COMMIT; """ % (opt.red_to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,opt.red_col_delim,quote, timeformat, ignoreheader) cur.execute(sql) con.close()
I did my best compiling all 3 steps into one script.
1 Comment
A.Ali
Thanks for your answer, but your script link is broken. Could you reshare it please?