Inserting data to redshift using python

Question

I'm trying to insert multiple rows into amazon redshift database , the rows included in a list of tuples which looks like this:

my_rows=[(1, 0.0, 0, 0.0, 2010188534, 1816780086, 1113834, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.0, 1, 0.0, 2010188536, 1816780086, 1119396, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.0, 2, 0.0, 2010188538, 1816780086, 1119398, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.0, 3, 0.0, 2010188540, 1816780086, 1123612, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.5, 0, 0.0, 2010188542, 1816780102, 1086852, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.5, 1, 0.0, 2010188544, 1816780102, 1087014, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.3, 2, 0.0, 2010188546, 1816780102, 1089224, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.3, 3, 0.0, 2010188548, 1816780102, 1089348, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.3, 4, 0.0, 2010188550, 1816780102, 1122564, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17')]

Some columns may contain None

I'm inserting them row by row into Redshift database this way:

    cur = con.cursor()
    columns_names=("c1","c2","c3","c4","c5","c6","c7","c8","c9","c10")
    insert_reference=len(my_rows[0])*"%s,"
    values_references="("+insert_reference[0:-1]+")"
    for row in my_rows:
      cur = con.cursor()
      insert_query="INSERT INTO "+table+" "+columns_names+" VALUES "+values_references+";"
      cur.execute(insert_query, row)

The problem is that when I run this code, it blocks on the first row without raising any error. So, my questions are : Is it normal that it takes so much time to insert one row ? If not is there some error in my code ? Is there some efficient way to that ?

Can i get some help please ? Thank you in advance

this method will be very very slow - how many rows are you hoping to insert? how often? — Jon Scott
– Jon Scott, Commented Mar 30, 2018 at 9:37
In fact, now I'm just doing a test to say whether data fits into database, but later, I'll be ingesting data once every 15 minutes — kab
– kab, Commented Mar 30, 2018 at 9:48
Avoid using the INSERT command to insert single rows into Redshift. You should be using the COPY command. See: Amazon Redshift Best Practices for Loading Data — John Rotenstein
– John Rotenstein, Commented Mar 30, 2018 at 10:39

Jon Scott · Accepted Answer · 2018-03-30 09:49:24Z

1

The process you should follow:

write your data in csv format to an s3 folder, ideally gzipped
run a redshift copy command to import that data into a temporary table in redshift
run redshift sql to insert that data into your table

That will run fast, is the correct & recommended way and will be scaleable.

answered Mar 30, 2018 at 9:49

Jon Scott

4,36420 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

kab Over a year ago

First, thanks a lot for your answer. I have data in aws s3 in JSON format, do i have to use a Lambda function to convert the past and the furtur files into csv format ?

John Rotenstein Over a year ago

The Amazon Redshift COPY command can also COPY from JSON Format.

Jon Scott Over a year ago

plus - if you want another option - you can define an external table in redshift (spectrum) to access that json data.

kab Over a year ago

In fact it'is a little bit more complicated, my JSON is structured this way : {table: "x", action: "insert", row:{c1:1, c2:"a"}}{table: "x", action: "delete", row:{c1:1, c2:"a"}} so it needs to be processed before copying files into database , and Besides I need data to be copied automatically in real time , which as far as I know the copy command don't

Jon Scott Over a year ago

you could also consider aws kinesis

Collectives™ on Stack Overflow

Inserting data to redshift using python

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related