4

I am using python to transfer data (~8 million rows) from oracle to vertica. I wrote a python script which transfers the data in 2 hours, but I am looking for ways to increase the transfer speed.

Process I am using :

  • Connect to Oracle
  • Pull the data into a dataframe (pandas)
  • Iterate over the rows in the dataframe one by one and insert into vertica (cursor.execute), I wanted to use the dataframe.to_sql method, but this method is limited to only couple of databases

Has anybody used a better way (bulk inserts or any other method?) to insert data into vertica using python?

Here is the code snippet :

df = pandas.read_sql_query(sql,conn)
conn_vertica = pyodbc.connect("DSN=dsnname")
cursor = conn_vertica.cursor()

for i,row in df.iterrows():
    cursor.execute("insert into <tablename> values(?,?,?,?,?,?,?,?,?)",row.values[0],row.values[1],row.values[2],row.values[3],row.values[4],row.values[5],row.values[6],row.values[7],row.values[8])

cursor.close()
conn_vertica.commit()
conn_vertica.close()

3 Answers 3

5

From vertica-python code https://github.com/uber/vertica-python/blob/master/vertica_python/vertica/cursor.py

with open("/tmp/file.csv", "rb") as fs: cursor.copy("COPY table(field1,field2) FROM STDIN DELIMITER ',' ENCLOSED BY '\"'", fs, buffer_size=65536)

Sign up to request clarification or add additional context in comments.

2 Comments

I want to run this command from python, how can i do this? copy cb.table_format2 FROM LOCAL 'C:\Users\Waqas Ali\Desktop\upload.csv' ENCLOSED BY '"' delimiter ',' SKIP 1 exceptions 'C:\Users\Waqas Ali\Desktop\except.csv' rejected data 'C:\Users\Waqas Ali\Desktop\reject.csv'; When i run this command using vertica client then it is working fine. But when i execute this from python then it only copy data from given file but does not create new file for rejected data.
I used this code in python. with open(file_path, "rb") as inf: cur.copy(copy_command, inf)
0

Doing single row inserts into Vertica is very inefficient. You need to load in batches.

The way we do it is using the COPY command, here is an example:

COPY mytable (firstcolumn, secondcolumn) FROM STDIN DELIMITER ',' ENCLOSED BY '"';

Have you considered using an existing library, for example vertica-python

Check out this link to Vertica's docs for more info on COPY options

3 Comments

Chris, to use the copy command, the source should be a file and in my program I have put the data in a dataframe, if I want to use COPY command then I will have to write the contents of the dataframe to a file and then use COPY command, not sure if it will help me that much. Have you used COPY command with a dataframe? or is writing dataframe to a file and then using it in COPY command the way to go about this?
It doesn't need to be a file, you can load directly from memory - the vertica-python docs show one way to do this.
@UdayShankar from github.com/uber/vertica-python/blob/master/vertica_python/… ``` with open("/tmp/file.csv", "rb") as fs: cursor.copy("COPY table(field1,field2) FROM STDIN DELIMITER ',' ENCLOSED BY '\"'", fs, buffer_size=65536) ```
0

In case you want to load a dataframe instead of the csv file into a Vertica table you can use this command:

from vertica_python import connect

db_connection = connect(host = 'hostname'
                       ,port = 5433
                       ,user = 'user', password = 'password'
                       ,database = 'db_name'
                       ,unicode_error = 'replace')

cursor = db_connection.cursor()    

cursor.copy("COPY table_name (field1, field2, ...) from stdin DELIMITER ','", \
            df.to_csv(header=None, index=False)\
           )

This part below is that makes the difference, it converts a dataframe in the memory into comma separated lines of strings that copy command can read:

df.to_csv(header=None, index=False)

It works very fast.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.