0

I'm trying to insert multiple rows into amazon redshift database , the rows included in a list of tuples which looks like this:

my_rows=[(1, 0.0, 0, 0.0, 2010188534, 1816780086, 1113834, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.0, 1, 0.0, 2010188536, 1816780086, 1119396, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.0, 2, 0.0, 2010188538, 1816780086, 1119398, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.0, 3, 0.0, 2010188540, 1816780086, 1123612, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.5, 0, 0.0, 2010188542, 1816780102, 1086852, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.5, 1, 0.0, 2010188544, 1816780102, 1087014, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.3, 2, 0.0, 2010188546, 1816780102, 1089224, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.3, 3, 0.0, 2010188548, 1816780102, 1089348, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17'), (1, 0.3, 4, 0.0, 2010188550, 1816780102, 1122564, '2018-03-07 09:40:17', '2018-03-07 09:40:17', '2018-03-07 09:40:17')]

Some columns may contain None

I'm inserting them row by row into Redshift database this way:

    cur = con.cursor()
    columns_names=("c1","c2","c3","c4","c5","c6","c7","c8","c9","c10")
    insert_reference=len(my_rows[0])*"%s,"
    values_references="("+insert_reference[0:-1]+")"
    for row in my_rows:
      cur = con.cursor()
      insert_query="INSERT INTO "+table+" "+columns_names+" VALUES "+values_references+";"
      cur.execute(insert_query, row)

The problem is that when I run this code, it blocks on the first row without raising any error. So, my questions are : Is it normal that it takes so much time to insert one row ? If not is there some error in my code ? Is there some efficient way to that ?

Can i get some help please ? Thank you in advance

5
  • this method will be very very slow - how many rows are you hoping to insert? how often? Commented Mar 30, 2018 at 9:37
  • I want to insert about 3000 rows Commented Mar 30, 2018 at 9:40
  • one off - or how often? Commented Mar 30, 2018 at 9:42
  • In fact, now I'm just doing a test to say whether data fits into database, but later, I'll be ingesting data once every 15 minutes Commented Mar 30, 2018 at 9:48
  • Avoid using the INSERT command to insert single rows into Redshift. You should be using the COPY command. See: Amazon Redshift Best Practices for Loading Data Commented Mar 30, 2018 at 10:39

1 Answer 1

1

The process you should follow:

  1. write your data in csv format to an s3 folder, ideally gzipped
  2. run a redshift copy command to import that data into a temporary table in redshift
  3. run redshift sql to insert that data into your table

That will run fast, is the correct & recommended way and will be scaleable.

Sign up to request clarification or add additional context in comments.

5 Comments

First, thanks a lot for your answer. I have data in aws s3 in JSON format, do i have to use a Lambda function to convert the past and the furtur files into csv format ?
The Amazon Redshift COPY command can also COPY from JSON Format.
plus - if you want another option - you can define an external table in redshift (spectrum) to access that json data.
In fact it'is a little bit more complicated, my JSON is structured this way : {table: "x", action: "insert", row:{c1:1, c2:"a"}}{table: "x", action: "delete", row:{c1:1, c2:"a"}} so it needs to be processed before copying files into database , and Besides I need data to be copied automatically in real time , which as far as I know the copy command don't
you could also consider aws kinesis

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.