I have a DataFrame with COVID-19 related data.
Here is an example row of said data
('Afghanistan', 'Confirmed', None, None, None, None, None, '2020-03-28', 1, 110.0, 100, 7, '2020-11-03'),
I am setting up the connection the following way:
quoted = urllib.parse.quote_plus("DRIVER={.../msodbcsql17/lib64/libmsodbcsql-17.6.so.1.1};SERVER=******;DATABASE=****;uid=***;pwd=***")
engine = create_engine('mssql+pyodbc:///?odbc_connect={}'.format(quoted))
con = engine.connect()
I then try to write to the db
df.to_sql('THE_TABLE', con = con, if_exists = 'append',index=False,schema='cd')
Which throws the following error
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The
incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parame$
The above exception was the direct cause of the following exception:
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][ODBC Driver
17 for SQL Server][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) pr$
[SQL: INSERT INTO cd.[EXT_DOUBLING_RATE] ([Country_Region], [Case_Type], [Doubling Rate],
[Coefficient], [Intercept], p_value, [R_squared], [Date], [Days normalized], [Cases], [Cutoff value],
[Window s$
[parameters: (('Afghanistan', 'Confirmed', None, None, None, None, None, '2020-03-27', 0, 110.0, 100,
7, '2020-11-06'), ('Afghanistan', 'Confirmed', None, None, None, None, None, '2020-03-28', 1, 110.0$
(Background on this error at: http://sqlalche.me/e/f405)
It seems that it has to do with the values of None because if try and insert the exact same row straight in the Database Tool with the value NULL instead of None it works.
So how do I push the data to the Microsoft SQL database such that it understands that None is NULL?
This is the output from df.info()
Data columns (total 13 columns):
Country_Region 69182 non-null object
Case_Type 69182 non-null object
Doubling Rate 63752 non-null float64
Coefficient 67140 non-null float64
Intercept 67140 non-null float64
p_value 67042 non-null float64
R_squared 63752 non-null float64
Date 69182 non-null object
Days normalized 69182 non-null int64
Cases 69182 non-null float64
Cutoff value 69182 non-null int64
Window size 69182 non-null int64
Script Refresh Date 69182 non-null object
dtypes: float64(6), int64(3), object(4)
None? Better yet, edit your question with thedf.info().