2

Let's say that I have this dataframe :

REFERENCE = ["GZF882348G", "SFGUZBJLNJU", "FTLNGZ242112", "DFBHGVGHG543"]
IBAN = ["FR7343563", "FR4832545", "FR9858331", "FR2001045"]
DEBIT = [26, '', 856, '']
CREDIT = ['', 324, '', 876]
MONTANT = [641, 33, '', 968]

df = pd.DataFrame({'Référence' : REFERENCE, 'IBAN' : IBAN, 'Débit' : DEBIT, 'Crédit' : CREDIT, 'Montant' : MONTANT})

I have a problem of format to insert this kind of data in my database. The columns "Débit", "Crédit", "Montant" are defined to get floats as data. However the data of these columns are not only integers, I have empty strings too and that is my issue. I know that I have to write a condition that replace a empty string by a "Null" value in the SQL format, however I do not know how to do that in python or in SQL. I am discovering/learning the SQL environment.

Here is my code :

import pandas as pd
import pyodbc 

server = '...'
database = '...'
username = '...' 
password = '...'
driver = '...'

connection = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+password)
cursor = connection.cursor()

for i, row in df.iterrows():


    sql_exe = "INSERT INTO dbo.tbl_data_xml (Réference,IBAN,Débit,Crédit,Montant) VALUES (?,?,?,?,?)"
    cursor.execute(sql_exe, tuple(row))
    
    connection.commit()

Anyone can help me please.

Thank you

2 Answers 2

1

You appear to be mixing types in Pandas data frame where string, '', is combined with integer in the same column as evidenced by all object types. In relational databases you cannot mix data types. And converting '' to string 'NULL' will not resolve your issue. In SQL, NULL <> 'NULL'

df.dtypes

# Référence    object
# IBAN         object
# Débit        object
# Crédit       object
# Montant      object
# dtype: object

Therefore, convert columns to numeric with pd.to_numeric where empty string, '', converts to NaN which this entity should translate to SQL's NULL entity.

df[['Débit', 'Crédit', 'Montant']] = df[['Débit', 'Crédit', 'Montant']].apply(pd.to_numeric)

df.dtypes
# Référence     object
# IBAN          object
# Débit        float64
# Crédit       float64
# Montant      float64
# dtype: object

df
#       Référence       IBAN  Débit  Crédit  Montant
# 0    GZF882348G  FR7343563   26.0     NaN    641.0
# 1   SFGUZBJLNJU  FR4832545    NaN   324.0     33.0
# 2  FTLNGZ242112  FR9858331  856.0     NaN      NaN
# 3  DFBHGVGHG543  FR2001045    NaN   876.0    968.0

Then run your query. In fact, avoid the slower for loop with iterrows and consider df.to_numpy + cursor.executemany.

# PREPARED STATEMENT
sql_exe = "INSERT INTO dbo.tbl_data_xml (Réference,IBAN,Débit,Crédit,Montant) VALUES (?,?,?,?,?)"

# CONVERT DATA TO LIST OF NUMPY ARRAYS
sql_data = df.where(pd.notnull(df), None).to_numpy().replace(.tolist()

# EXECUTE ACTION QUERY
cursor.executemany(sql_exe, sql_data)
connection.commit()
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you @Parfait for your help, it was exactly what I was looking for. However I still can not say that it works, because I have a new error that I do not understand : "The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 9 (""): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision." My dataframe has 9 columns in reality, I have defined these columns to get (float,null) in the database
Sounds like you did not properly convert needed column as empty string still renders. Can you post a data frame sample of 9 columns rather than 5 columns? Also, are you adjusting the insert columns of SQL statement? Try to stay as consistent to this answer as possible when adding/removing columns. Please post your new attempt.
Thank you, you find it in this new post : stackoverflow.com/questions/65080139/… If you can help me, I would appreciate it
Technically your question is a duplicate of this one which leaves this unresolved. But I see you have an answer.
See my edit, converting remaining NaN to None using DataFrame.where.
|
0

Convert to numeric the respective columns and fillna(NULL)

df[['Débit', 'Crédit', 'Montant']]=df.iloc[:,2:].apply(lambda x: pd.to_numeric(x).fillna('NULL'))



     Référence       IBAN Débit Crédit Montant
0    GZF882348G  FR7343563    26   NULL     641
1   SFGUZBJLNJU  FR4832545  NULL    324      33
2  FTLNGZ242112  FR9858331   856   NULL    NULL
3  DFBHGVGHG543  FR2001045  NULL    876     968

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.