1

I would like to know community opinion regarding the "best way to copy data from PostgreSQL to RedShift with python 2.7.x". I can't use Amazon S3 and RedShift is normal postgresql database but support copy only from S3(I can't use)

4
  • ... and you can't use S3 because ...? Commented May 6, 2015 at 8:36
  • I have to perform all manipulations in parallel. It's not supported to execute more than one copy command each time. Anyway it's mostly python knowledge question. Commented May 6, 2015 at 8:51
  • The Redshift docs say they support COPY "... from files on Amazon S3, from a DynamoDB table, or from text output from one or more remote hosts" so you should be able to load in COPY data via python/psycopg2 as you wish. Commented May 6, 2015 at 13:29
  • Josh, please see my previous comment. Technically it's possible but requirement says 10 parallel operations that impossible than I use copy. Commented May 7, 2015 at 14:24

1 Answer 1

1

You can use Python/psycopg2/boto to code it end-to-end. Alternative to psycopg2 is PosgtreSQL client (psql.exe).

If you use psycopg2 you can:

  1. Spool to a file from PostgreSQL
  2. Upload to S3
  3. Append to Redshift table.

If you use psql.exe you can:

  1. Pipe data to S3 multipart uploader from PostgreSQL

    in_qry=open(opt.pgres_query_file, "r").read().strip().strip(';')
    db_client_dbshell=r'%s\bin\psql.exe' % PGRES_CLIENT_HOME.strip('"')
    loadConf=[ db_client_dbshell ,'-U', opt.pgres_user,'-d',opt.pgres_db_name, '-h', opt.pgres_db_server]
    
    q="""
    COPY ((%s) %s)
    TO STDOUT
    WITH DELIMITER ','
    CSV %s
    """ % (in_qry, limit, quote)
    #print q
    p1 = Popen(['echo', q], stdout=PIPE,stderr=PIPE,env=env)
    
    p2 = Popen(loadConf, stdin=p1.stdout, stdout=PIPE,stderr=PIPE)
    
    p1.wait()
    return p2
    
  2. Upload to S3.

  3. Append to Redshift table using psycopg2.

    fn='s3://%s' % location
    conn_string = REDSHIFT_CONNECT_STRING.strip().strip('"')    
    con = psycopg2.connect(conn_string);
    cur = con.cursor(); 
    quote=''
    if opt.red_quote:
        quote='quote \'%s\'' % opt.red_quote
    ignoreheader =''
    if opt.red_ignoreheader:
        ignoreheader='IGNOREHEADER %s' % opt.red_ignoreheader
    timeformat=''
    if opt.red_timeformat:
        #timeformat=" dateformat 'auto' "
        timeformat=" TIMEFORMAT '%s'" %     opt.red_timeformat.strip().strip("'")
    sql="""
    COPY %s FROM '%s' 
    CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s' 
    DELIMITER '%s' 
    FORMAT CSV %s 
    GZIP 
    %s 
    %s; 
    COMMIT;
    """ % (opt.red_to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,opt.red_col_delim,quote, timeformat, ignoreheader)
    cur.execute(sql)    
    con.close()
    

I did my best compiling all 3 steps into one script.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer, but your script link is broken. Could you reshare it please?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.