I am using python to transfer data (~8 million rows) from oracle to vertica. I wrote a python script which transfers the data in 2 hours, but I am looking for ways to increase the transfer speed.
Process I am using :
- Connect to Oracle
- Pull the data into a dataframe (pandas)
- Iterate over the rows in the dataframe one by one and insert into vertica (cursor.execute), I wanted to use the
dataframe.to_sqlmethod, but this method is limited to only couple of databases
Has anybody used a better way (bulk inserts or any other method?) to insert data into vertica using python?
Here is the code snippet :
df = pandas.read_sql_query(sql,conn)
conn_vertica = pyodbc.connect("DSN=dsnname")
cursor = conn_vertica.cursor()
for i,row in df.iterrows():
cursor.execute("insert into <tablename> values(?,?,?,?,?,?,?,?,?)",row.values[0],row.values[1],row.values[2],row.values[3],row.values[4],row.values[5],row.values[6],row.values[7],row.values[8])
cursor.close()
conn_vertica.commit()
conn_vertica.close()