Writing a dataframe from Python3 into Netezza

Question

I want to write a dataframe straight from Python into Netezza. Is this even possible? What I am doing right now is exporting to CSV then uploading into Netezza which is quite time consuming...

A dummy dataset is as follows:

s1=pd.Series(["Test1"])
s2=pd.Series(["Test2"])
s3=pd.Series(["Test3"])
df=pd.DataFrame([list(s1), list(s2),  list(s3)],  columns =  ["STATUS"])

I then connect to Netezza (example):

#import libraries
import pandas as pd
import pyodbc

engine = pyodbc.connect("DRIVER={NetezzaSQL};SERVER=netezzakm01;PORT=5480;DATABASE=SDSNZKM01;UID=KMMTBA;PWD=xxx;")

This is as far as I get because anything else I have tried to write with has come nowhere near working...... is it possible to do this?

tipanverella · Accepted Answer · 2017-10-04 19:59:18Z

So far, the best I have been able to do is the following:

On the netezza side I create a table with the structure I want,

create table mytablename ( MYDATE DATE, MYDISTRIBUTEKEY BIGINT, SOMESTRING VARCHAR(128) DEFAULT NULL, SOMEINT INTEGER DEFAULT NULL, SOMEFLOAT FLOAT DEFAULT NULL ) distribute on (mydistributekey);
my dataframe say mydf is such that

list(mydf.columns) == ['MYDATE','MYDISTRIBUTEKEY','SOMESTRING','SOMEINT','SOMEFLOAT']
dump the dataframe to disk

mydf.to_csv('df_on_disk.tab',sep='\t',index=False,header=False)
you can now to a BULK load of the file on disk into netezza, from netezza. Note that I connect via a JDBC, with jaydebeapi.

INSERT INTO mytablename SELECT * FROM EXTERNAL 'df_on_disk.tab' USING (DELIM '\t' REMOTESOURCE 'JDBC'); GENERATE STATISTICS ON mytablename;

This is not the exact solution, but you would want to do something like that.

Collectives™ on Stack Overflow

Writing a dataframe from Python3 into Netezza

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related