Bulk insert into vertica using Python

Question

I am using python to transfer data (~8 million rows) from oracle to vertica. I wrote a python script which transfers the data in 2 hours, but I am looking for ways to increase the transfer speed.

Process I am using :

Connect to Oracle
Pull the data into a dataframe (pandas)
Iterate over the rows in the dataframe one by one and insert into vertica (cursor.execute), I wanted to use the dataframe.to_sql method, but this method is limited to only couple of databases

Has anybody used a better way (bulk inserts or any other method?) to insert data into vertica using python?

Here is the code snippet :

df = pandas.read_sql_query(sql,conn)
conn_vertica = pyodbc.connect("DSN=dsnname")
cursor = conn_vertica.cursor()

for i,row in df.iterrows():
    cursor.execute("insert into <tablename> values(?,?,?,?,?,?,?,?,?)",row.values[0],row.values[1],row.values[2],row.values[3],row.values[4],row.values[5],row.values[6],row.values[7],row.values[8])

cursor.close()
conn_vertica.commit()
conn_vertica.close()

Saurabh Saxena · Accepted Answer · 2015-11-05 23:17:58Z

5

From vertica-python code https://github.com/uber/vertica-python/blob/master/vertica_python/vertica/cursor.py

with open("/tmp/file.csv", "rb") as fs: cursor.copy("COPY table(field1,field2) FROM STDIN DELIMITER ',' ENCLOSED BY '\"'", fs, buffer_size=65536)

edited Nov 5, 2015 at 23:17

answered Nov 5, 2015 at 22:51

Saurabh Saxena

1,4172 gold badges15 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Waqas Ali Over a year ago

I want to run this command from python, how can i do this? copy cb.table_format2 FROM LOCAL 'C:\Users\Waqas Ali\Desktop\upload.csv' ENCLOSED BY '"' delimiter ',' SKIP 1 exceptions 'C:\Users\Waqas Ali\Desktop\except.csv' rejected data 'C:\Users\Waqas Ali\Desktop\reject.csv'; When i run this command using vertica client then it is working fine. But when i execute this from python then it only copy data from given file but does not create new file for rejected data.

Waqas Ali Over a year ago

I used this code in python. with open(file_path, "rb") as inf: cur.copy(copy_command, inf)

Dave Gray · Accepted Answer · 2015-09-21 15:36:57Z

0

Doing single row inserts into Vertica is very inefficient. You need to load in batches.

The way we do it is using the COPY command, here is an example:

COPY mytable (firstcolumn, secondcolumn) FROM STDIN DELIMITER ',' ENCLOSED BY '"';

Have you considered using an existing library, for example vertica-python

Check out this link to Vertica's docs for more info on COPY options

edited Sep 21, 2015 at 15:36

Dave Gray

7235 silver badges11 bronze badges

answered Sep 18, 2015 at 19:40

Chris McFadden

411 bronze badge

3 Comments

Data Enthusiast Over a year ago

Chris, to use the copy command, the source should be a file and in my program I have put the data in a dataframe, if I want to use COPY command then I will have to write the contents of the dataframe to a file and then use COPY command, not sure if it will help me that much. Have you used COPY command with a dataframe? or is writing dataframe to a file and then using it in COPY command the way to go about this?

Dave Gray Over a year ago

It doesn't need to be a file, you can load directly from memory - the vertica-python docs show one way to do this.

Saurabh Saxena Over a year ago

@UdayShankar from github.com/uber/vertica-python/blob/master/vertica_python/… ``` with open("/tmp/file.csv", "rb") as fs: cursor.copy("COPY table(field1,field2) FROM STDIN DELIMITER ',' ENCLOSED BY '\"'", fs, buffer_size=65536) ```

Victor Criclivii · Accepted Answer · 2021-03-23 17:38:03Z

In case you want to load a dataframe instead of the csv file into a Vertica table you can use this command:

from vertica_python import connect

db_connection = connect(host = 'hostname'
                       ,port = 5433
                       ,user = 'user', password = 'password'
                       ,database = 'db_name'
                       ,unicode_error = 'replace')

cursor = db_connection.cursor()    

cursor.copy("COPY table_name (field1, field2, ...) from stdin DELIMITER ','", \
            df.to_csv(header=None, index=False)\
           )

This part below is that makes the difference, it converts a dataframe in the memory into comma separated lines of strings that copy command can read:

df.to_csv(header=None, index=False)

It works very fast.

Collectives™ on Stack Overflow

Bulk insert into vertica using Python

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related