2

I want to write a dataframe straight from Python into Netezza. Is this even possible? What I am doing right now is exporting to CSV then uploading into Netezza which is quite time consuming...

A dummy dataset is as follows:

s1=pd.Series(["Test1"])
s2=pd.Series(["Test2"])
s3=pd.Series(["Test3"])
df=pd.DataFrame([list(s1), list(s2),  list(s3)],  columns =  ["STATUS"])

I then connect to Netezza (example):

#import libraries
import pandas as pd
import pyodbc

engine = pyodbc.connect("DRIVER={NetezzaSQL};SERVER=netezzakm01;PORT=5480;DATABASE=SDSNZKM01;UID=KMMTBA;PWD=xxx;")

This is as far as I get because anything else I have tried to write with has come nowhere near working...... is it possible to do this?

1 Answer 1

1

So far, the best I have been able to do is the following:

  1. On the netezza side I create a table with the structure I want,

    create table mytablename ( MYDATE DATE, MYDISTRIBUTEKEY BIGINT, SOMESTRING VARCHAR(128) DEFAULT NULL, SOMEINT INTEGER DEFAULT NULL, SOMEFLOAT FLOAT DEFAULT NULL ) distribute on (mydistributekey);

  2. my dataframe say mydf is such that

    list(mydf.columns) == ['MYDATE','MYDISTRIBUTEKEY','SOMESTRING','SOMEINT','SOMEFLOAT']

  3. dump the dataframe to disk

    mydf.to_csv('df_on_disk.tab',sep='\t',index=False,header=False)

  4. you can now to a BULK load of the file on disk into netezza, from netezza. Note that I connect via a JDBC, with jaydebeapi.

    INSERT INTO mytablename SELECT * FROM EXTERNAL 'df_on_disk.tab' USING (DELIM '\t' REMOTESOURCE 'JDBC'); GENERATE STATISTICS ON mytablename;

This is not the exact solution, but you would want to do something like that.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.