Pandas read_sql_query returning None for all values in some columns

Question

I am using pandas read_sql_query to read data from a MySQL database table into a pandas dataframe. Some columns in this table have all NULL values. For those columns the pandas dataframe contains None in every row. For all other columns the dataframe contains NaN where there was a NULL value. Can anyone explain why None is returned for the all NULL columns? And how do I make sure I have all NaNs, hopefully without doing manual conversions? I should add that two of the columns causing this problem are float and the third is of type double,

EDIT

Here is an example. The columns pef and fer contain all NULLS in the database.

from sqlalchemy import create_engine
import pandas as pd
import math

querystr = "SELECT * FROM dbname.mytable"
engine = create_engine('mysql+pymysql://username:password@localhost/' + "dbname")
df = pd.read_sql_query(querystr, engine)
df.head()

    sys     dias    pef     fer
0   NaN     NaN     None    None
1   159.0   92.666  None    None
2   NaN     NaN     None    None
3   NaN     NaN     None    None
4   102.0   63.333  None    None

In the MySQL database these columns are defined as:

Columns: 
    sys float 
    dias float 
    pef float 
    fer float

I would expect the columns pef and fer to contain NaN in each row, not None.

Can you add a minimal example of how your data looks like in your database, how they look when you parse them using Pandas and how you expect them to appear? Just edit your question to include those + any code you are using currently. — Rafael
– Rafael, Commented Nov 15, 2018 at 8:24

maxschlepzig · Accepted Answer · 2021-04-19 21:56:39Z

11

The problem is an open issue and is explained here: here: https://github.com/pandas-dev/pandas/issues/14314

read_sql_query just gets result sets back, without any column type information. If you use the read_sql_table functions, there it uses the column type information through SQLAlchemy.

It seems that read_sql_query only checks the first 3 values returned in a column to determine the type of the column. So if the first 3 values are NULL it cannot determine the type of the column and so returns None.

So a partial workaround is to use read_sql_table. I changed my code to use read_sql_table and it returns NaN values as expected even for the all NULL columns. But in my real application I really need to use read_sql_query. So I am now replacing any None values with NaN as soon as the results are returned:

df.replace([None], np.nan, inplace=True)

edited Apr 19, 2021 at 21:56

maxschlepzig

40.1k16 gold badges166 silver badges216 bronze badges

answered Nov 16, 2018 at 1:20

panda

8811 gold badge12 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Andy Over a year ago

Very helpful. But out of interest why do you really need to use read_sql_query rather than read_sql_table in your application?

maxschlepzig Over a year ago

@Andy, one example: you cannot use read_sql_table if you only need a subset of the table and reading the complete table would return way too may records.

Doubledown · Accepted Answer · 2019-10-02 16:50:18Z

I tried using read_sql_table and it does not fix the issue for me. Additionally, I found the accepted answer actually creates other issues.

For my data, the only columns that have 'None' instead of NaN are ones pandas thinks are objects. For datetime, the missings are NaT; for float, the missings are NaN.

read_sql_table did not work for me and returned the same issue as read_sql. So then I tried the accepted answer and ran df.replace([None], np.nan, inplace=True). This actually changed all my datetime objects with missing data to object dtypes. So now I'd have to change them back to datetime which can be taxing depending on the size of your data.

Instead, I recommend you first identify the object dtype fields in your df and then replace the None:

obj_columns = list(df.select_dtypes(include=['object']).columns.values)
df[obj_columns] = df[obj_columns].replace([None], np.nan)

Collectives™ on Stack Overflow

Pandas read_sql_query returning None for all values in some columns

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest