5

I have a DataFrame with COVID-19 related data.

Here is an example row of said data

('Afghanistan', 'Confirmed', None, None, None, None, None, '2020-03-28', 1, 110.0, 100, 7, '2020-11-03'),

I am setting up the connection the following way:

  quoted = urllib.parse.quote_plus("DRIVER={.../msodbcsql17/lib64/libmsodbcsql-17.6.so.1.1};SERVER=******;DATABASE=****;uid=***;pwd=***")
  engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
  con = engine.connect()

I then try to write to the db

  df.to_sql('THE_TABLE', con = con, if_exists = 'append',index=False,schema='cd')

Which throws the following error

pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The 
incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parame$

The above exception was the direct cause of the following exception:

sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][ODBC Driver 
17 for SQL Server][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) pr$
[SQL: INSERT INTO cd.[EXT_DOUBLING_RATE] ([Country_Region], [Case_Type], [Doubling Rate], 
[Coefficient], [Intercept], p_value, [R_squared], [Date], [Days normalized], [Cases], [Cutoff value], 
[Window s$
[parameters: (('Afghanistan', 'Confirmed', None, None, None, None, None, '2020-03-27', 0, 110.0, 100, 
7, '2020-11-06'), ('Afghanistan', 'Confirmed', None, None, None, None, None, '2020-03-28', 1, 110.0$
(Background on this error at: http://sqlalche.me/e/f405)

It seems that it has to do with the values of None because if try and insert the exact same row straight in the Database Tool with the value NULL instead of None it works.

So how do I push the data to the Microsoft SQL database such that it understands that None is NULL?

This is the output from df.info()

Data columns (total 13 columns):
Country_Region         69182 non-null object
Case_Type              69182 non-null object
Doubling Rate          63752 non-null float64
Coefficient            67140 non-null float64
Intercept              67140 non-null float64
p_value                67042 non-null float64
R_squared              63752 non-null float64
Date                   69182 non-null object
Days normalized        69182 non-null int64
Cases                  69182 non-null float64
Cutoff value           69182 non-null int64
Window size            69182 non-null int64
Script Refresh Date    69182 non-null object
dtypes: float64(6), int64(3), object(4)
2
  • Can you please check the dtype of all None? Better yet, edit your question with the df.info(). Commented Nov 6, 2020 at 11:31
  • Added "df.info()" Commented Nov 6, 2020 at 12:15

1 Answer 1

5

It seems, as you say to be some issue with the None. But there is a work around by replacing all None with NaN before writing to the DB. Here is an example. I create a DB to write to.

import numpy as np
import pandas as pd
from sqlalchemy import create_engine

Create some dataframe df with None values

df = pd.DataFrame(np.random.rand(5,3))
df2 = df.where(df < .2, None)

which gives

0   0.178066    None    0.00600411
1   None    0.0294849   None
2   None    0.00374341  None
3   None        None    None
4   0.182899    None    None

Replace all None with NaN

DF = df2.fillna(value=np.nan)

which give

     0            1       2     
0   0.178066    NaN       0.006004
1   NaN         0.029485  NaN
2   NaN         0.003743  NaN
3   NaN          NaN      NaN
4   0.182899     NaN      NaN

Now some cosmethics

DF =DF.rename(columns = {0:'a', 1:'b',2:'c'})

In this step I create a DB to upload the stuff to to test and upload the DF

database = create_engine('sqlite:///database.db', echo=False)
DF.to_sql("FACTS2", if_exists = 'replace',con=database)

Now...If what is uploaded are NULL then any query on NULL should return what is NULL.

result = csv_database.execute("SELECT a, b, c FROM FACTS2 where a is NULL")

The result can then be read in pandas as a df as

pd.DataFrame(result)

Which is

    a       b           c
0   None    0.029485    None
1   None    0.003743    None
2   None    NaN         None

Conclusion: NULL are being written to your DB. So the key to solving your problem is simply DF = df2.fillna(value=np.nan). Note the strange thing that can happen though. In column b (which is not entirely NULL AFTER the query, the NULL are written as NaN in the pandas DataFrame. This is NOT a problem in itself. The follwing query shows that there is nothing dodgy about how they are stored in the DB:

result = csv_database.execute("SELECT a, b, c FROM FACTS2 where b is NULL")

giving

      a         b       c
0   0.178066    None    0.006004
1   NaN         None    NaN
2   0.182899    None    NaN

This is a known issue.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.