Python/ SQL : replacing the empty strings of a DataFrame by a "Null" value to insert the data in a database

Question

Let's say that I have this dataframe :

REFERENCE = ["GZF882348G", "SFGUZBJLNJU", "FTLNGZ242112", "DFBHGVGHG543"]
IBAN = ["FR7343563", "FR4832545", "FR9858331", "FR2001045"]
DEBIT = [26, '', 856, '']
CREDIT = ['', 324, '', 876]
MONTANT = [641, 33, '', 968]

df = pd.DataFrame({'Référence' : REFERENCE, 'IBAN' : IBAN, 'Débit' : DEBIT, 'Crédit' : CREDIT, 'Montant' : MONTANT})

I have a problem of format to insert this kind of data in my database. The columns "Débit", "Crédit", "Montant" are defined to get floats as data. However the data of these columns are not only integers, I have empty strings too and that is my issue. I know that I have to write a condition that replace a empty string by a "Null" value in the SQL format, however I do not know how to do that in python or in SQL. I am discovering/learning the SQL environment.

Here is my code :

import pandas as pd
import pyodbc 

server = '...'
database = '...'
username = '...' 
password = '...'
driver = '...'

connection = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+password)
cursor = connection.cursor()

for i, row in df.iterrows():


    sql_exe = "INSERT INTO dbo.tbl_data_xml (Réference,IBAN,Débit,Crédit,Montant) VALUES (?,?,?,?,?)"
    cursor.execute(sql_exe, tuple(row))
    
    connection.commit()

Anyone can help me please.

Thank you

Parfait · Accepted Answer · 2020-11-30 21:55:07Z

1

You appear to be mixing types in Pandas data frame where string, '', is combined with integer in the same column as evidenced by all object types. In relational databases you cannot mix data types. And converting '' to string 'NULL' will not resolve your issue. In SQL, NULL <> 'NULL'

df.dtypes

# Référence    object
# IBAN         object
# Débit        object
# Crédit       object
# Montant      object
# dtype: object

Therefore, convert columns to numeric with pd.to_numeric where empty string, '', converts to NaN which this entity should translate to SQL's NULL entity.

df[['Débit', 'Crédit', 'Montant']] = df[['Débit', 'Crédit', 'Montant']].apply(pd.to_numeric)

df.dtypes
# Référence     object
# IBAN          object
# Débit        float64
# Crédit       float64
# Montant      float64
# dtype: object

df
#       Référence       IBAN  Débit  Crédit  Montant
# 0    GZF882348G  FR7343563   26.0     NaN    641.0
# 1   SFGUZBJLNJU  FR4832545    NaN   324.0     33.0
# 2  FTLNGZ242112  FR9858331  856.0     NaN      NaN
# 3  DFBHGVGHG543  FR2001045    NaN   876.0    968.0

Then run your query. In fact, avoid the slower for loop with iterrows and consider df.to_numpy + cursor.executemany.

# PREPARED STATEMENT
sql_exe = "INSERT INTO dbo.tbl_data_xml (Réference,IBAN,Débit,Crédit,Montant) VALUES (?,?,?,?,?)"

# CONVERT DATA TO LIST OF NUMPY ARRAYS
sql_data = df.where(pd.notnull(df), None).to_numpy().replace(.tolist()

# EXECUTE ACTION QUERY
cursor.executemany(sql_exe, sql_data)
connection.commit()

edited Nov 30, 2020 at 21:55

answered Nov 30, 2020 at 2:13

Parfait

108k19 gold badges102 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Maikiii Over a year ago

Thank you @Parfait for your help, it was exactly what I was looking for. However I still can not say that it works, because I have a new error that I do not understand : "The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 9 (""): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision." My dataframe has 9 columns in reality, I have defined these columns to get (float,null) in the database

Parfait Over a year ago

Sounds like you did not properly convert needed column as empty string still renders. Can you post a data frame sample of 9 columns rather than 5 columns? Also, are you adjusting the insert columns of SQL statement? Try to stay as consistent to this answer as possible when adding/removing columns. Please post your new attempt.

Maikiii Over a year ago

Thank you, you find it in this new post : stackoverflow.com/questions/65080139/… If you can help me, I would appreciate it

Parfait Over a year ago

Technically your question is a duplicate of this one which leaves this unresolved. But I see you have an answer.

Parfait Over a year ago

See my edit, converting remaining NaN to None using DataFrame.where.

|

wwnde · Accepted Answer · 2020-11-29 21:35:56Z

0

Convert to numeric the respective columns and fillna(NULL)

df[['Débit', 'Crédit', 'Montant']]=df.iloc[:,2:].apply(lambda x: pd.to_numeric(x).fillna('NULL'))



     Référence       IBAN Débit Crédit Montant
0    GZF882348G  FR7343563    26   NULL     641
1   SFGUZBJLNJU  FR4832545  NULL    324      33
2  FTLNGZ242112  FR9858331   856   NULL    NULL
3  DFBHGVGHG543  FR2001045  NULL    876     968

answered Nov 29, 2020 at 21:35

wwnde

26.7k6 gold badges22 silver badges38 bronze badges

Collectives™ on Stack Overflow

Python/ SQL : replacing the empty strings of a DataFrame by a "Null" value to insert the data in a database

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related