How to find which columns contain any NaN value in Pandas dataframe

Question

Given a pandas dataframe containing possible NaN values scattered here and there:

Question: How do I determine which columns contain NaN values? In particular, can I get a list of the column names containing NaNs?

df.isna().any()[lambda x: x] works for me

matanox
– matanox

2018-08-10 13:32:13 +00:00
Commented Aug 10, 2018 at 13:32 — matanox
– matanox, Commented Aug 10, 2018 at 13:32

MaxU - stand with Ukraine · Accepted Answer · 2018-01-18 15:30:33Z

428

UPDATE: using Pandas 0.22.0

Newer Pandas versions have new methods 'DataFrame.isna()' and 'DataFrame.notna()'

In [71]: df
Out[71]:
     a    b  c
0  NaN  7.0  0
1  0.0  NaN  4
2  2.0  NaN  4
3  1.0  7.0  0
4  1.0  3.0  9
5  7.0  4.0  9
6  2.0  6.0  9
7  9.0  6.0  4
8  3.0  0.0  9
9  9.0  0.0  1

In [72]: df.isna().any()
Out[72]:
a     True
b     True
c    False
dtype: bool

as list of columns:

In [74]: df.columns[df.isna().any()].tolist()
Out[74]: ['a', 'b']

to select those columns (containing at least one NaN value):

In [73]: df.loc[:, df.isna().any()]
Out[73]:
     a    b
0  NaN  7.0
1  0.0  NaN
2  2.0  NaN
3  1.0  7.0
4  1.0  3.0
5  7.0  4.0
6  2.0  6.0
7  9.0  6.0
8  3.0  0.0
9  9.0  0.0

OLD answer:

Try to use isnull():

In [97]: df
Out[97]:
     a    b  c
0  NaN  7.0  0
1  0.0  NaN  4
2  2.0  NaN  4
3  1.0  7.0  0
4  1.0  3.0  9
5  7.0  4.0  9
6  2.0  6.0  9
7  9.0  6.0  4
8  3.0  0.0  9
9  9.0  0.0  1

In [98]: pd.isnull(df).sum() > 0
Out[98]:
a     True
b     True
c    False
dtype: bool

or as @root proposed clearer version:

In [5]: df.isnull().any()
Out[5]:
a     True
b     True
c    False
dtype: bool

In [7]: df.columns[df.isnull().any()].tolist()
Out[7]: ['a', 'b']

to select a subset - all columns containing at least one NaN value:

In [31]: df.loc[:, df.isnull().any()]
Out[31]:
     a    b
0  NaN  7.0
1  0.0  NaN
2  2.0  NaN
3  1.0  7.0
4  1.0  3.0
5  7.0  4.0
6  2.0  6.0
7  9.0  6.0
8  3.0  0.0
9  9.0  0.0

edited Jan 18, 2018 at 15:30

answered Mar 25, 2016 at 18:54

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Jesper - jtk.eth Over a year ago

Thanks for the response! I am looking to get a list of the column names (I updated my question accordingly), do you know how?

gregorio099 Over a year ago

Do you know a good a way to select all columns with a specific value instead of null values?

gregorio099 Over a year ago

Nevermind! Simply replace .isnull() with .isin(['xxx']) to search for values instead of nulls: df.columns[df.isin['xxx'].any()].tolist()

MaxU - stand with Ukraine Over a year ago

@gregorio099, i'd do it this way: df.columns[df.eq(search_for_value).any()].tolist()

jezrael Over a year ago

Nice answer, already upvoted. Idea - can you add new functions isna, notna ?

|

Nikos Tavoularis · Accepted Answer · 2018-01-18 14:33:14Z

46

You can use df.isnull().sum(). It shows all columns and the total NaNs of each feature.

edited Jan 18, 2018 at 14:33

Nikos Tavoularis

2,9231 gold badge33 silver badges28 bronze badges

answered Nov 21, 2017 at 17:18

Matheus

5614 silver badges5 bronze badges

1 Comment

Edward Over a year ago

Do you have a quick approach for using and setting conditions based on this method.? For example, if col4 and col5 and col6 is null: df=df[["col1","col2","col3"]]

Kunal Gupta · Accepted Answer · 2021-08-09 07:09:40Z

28

I had a problem where I had to many columns to visually inspect on the screen so a shortlist comp that filters and returns the offending columns is

nan_cols = [i for i in df.columns if df[i].isnull().any()]

if that's helpful to anyone

Adding to that if you want to filter out columns having more nan values than a threshold, say 85% then use

nan_cols85 = [i for i in df.columns if df[i].isnull().sum() > 0.85*len(data)]

edited Aug 9, 2021 at 7:09

Kunal Gupta

134 bronze badges

answered Aug 7, 2019 at 7:25

Tom Wattley

6197 silver badges8 bronze badges

Comments

Uday Kiran · Accepted Answer · 2020-06-17 16:25:35Z

19

This worked for me,

1. For getting Columns having at least 1 null value. (column names)

data.columns[data.isnull().any()]

2. For getting Columns with count, with having at least 1 null value.

data[data.columns[data.isnull().any()]].isnull().sum()

[Optional] 3. For getting percentage of the null count.

data[data.columns[data.isnull().any()]].isnull().sum() * 100 / data.shape[0]

answered Jun 17, 2020 at 16:25

Uday Kiran

7298 silver badges9 bronze badges

Comments

bladnman · Accepted Answer · 2021-11-22 16:14:28Z

10

I know this is a very well-answered question but I wanted to add a slight adjustment. This answer only returns columns containing nulls, and also still shows the count of the nulls.

As 1-liner:

pd.isnull(df).sum()[pd.isnull(df).sum() > 0]

Description

Count nulls in each column

null_count_ser = pd.isnull(df).sum()

True|False series describing if that column had nulls

is_null_ser = null_count_ser > 0

Use the T|F series to filter out those without

null_count_ser[is_null_ser]

Example Output

name          5
phone         187
age           644

answered Nov 22, 2021 at 16:14

bladnman

2,6711 gold badge28 silver badges21 bronze badges

Comments

A. Nurul Istiqamah · Accepted Answer · 2021-01-09 02:03:06Z

9

df.columns[df.isnull().any()].tolist()

it will return name of columns that contains null rows

answered Jan 9, 2021 at 2:03

A. Nurul Istiqamah

1631 silver badge10 bronze badges

Comments

Pradeep Singh · Accepted Answer · 2020-02-17 10:57:26Z

In datasets having large number of columns its even better to see how many columns contain null values and how many don't.

print("No. of columns containing null values")
print(len(df.columns[df.isna().any()]))

print("No. of columns not containing null values")
print(len(df.columns[df.notna().all()]))

print("Total no. of columns in the dataframe")
print(len(df.columns))

For example in my dataframe it contained 82 columns, of which 19 contained at least one null value.

Further you can also automatically remove cols and rows depending on which has more null values
Here is the code which does this intelligently:

df = df.drop(df.columns[df.isna().sum()>len(df.columns)],axis = 1)
df = df.dropna(axis = 0).reset_index(drop=True)

Note: Above code removes all of your null values. If you want null values, process them before.

Frank · Accepted Answer · 2018-12-07 17:05:53Z

3

i use these three lines of code to print out the column names which contain at least one null value:

for column in dataframe:
    if dataframe[column].isnull().any():
       print('{0} has {1} null values'.format(column, dataframe[column].isnull().sum()))

edited Dec 7, 2018 at 17:05

answered Dec 7, 2018 at 16:48

Frank

1,1333 gold badges14 silver badges28 bronze badges

Comments

Anand · Accepted Answer · 2021-06-23 12:33:20Z

3

This is one of the methods..

import pandas as pd
df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan],'c':[np.nan,2,np.nan], 'd':[np.nan,np.nan,np.nan]})
print(pd.isnull(df).sum())

enter image description here

answered Jun 23, 2021 at 12:33

Anand

312 bronze badges

Comments

prosti · Accepted Answer · 2019-05-06 22:00:54Z

2

Both of these should work:

df.isnull().sum()
df.isna().sum()

DataFrame methods isna() or isnull() are completely identical.

Note: Empty strings '' is considered as False (not considered NA)

answered May 6, 2019 at 22:00

prosti

46.9k19 gold badges199 silver badges161 bronze badges

Comments

arioboo · Accepted Answer · 2020-11-11 23:12:03Z

1

df.isna() return True values for NaN, False for the rest. So, doing:

df.isna().any()

will return True for any column having a NaN, False for the rest

edited Nov 11, 2020 at 23:12

answered Nov 4, 2020 at 14:21

arioboo

1951 gold badge1 silver badge11 bronze badges

Comments

Harry_Verman · Accepted Answer · 2024-02-20 17:59:06Z

1

If you are looking for printing columns alongside their respective null_values:

null_cols = [i for i in df.columns if df[i].isnull().any()] 
for i in null_cols:
    print(i,df[i].isnull().sum())

answered Feb 20, 2024 at 17:59

Harry_Verman

313 bronze badges

Comments

BSalita · Accepted Answer · 2021-07-09 16:23:25Z

0

To see just the columns containing NaNs and just the rows containing NaNs:

isnulldf = df.isnull()
columns_containing_nulls = isnulldf.columns[isnulldf.any()]
rows_containing_nulls = df[isnulldf[columns_containing_nulls].any(axis='columns')].index
only_nulls_df = df[columns_containing_nulls].loc[rows_containing_nulls]
print(only_nulls_df)

edited Jul 9, 2021 at 16:23

answered Jul 9, 2021 at 15:50

BSalita

9,07111 gold badges59 silver badges75 bronze badges

Comments

Satish Khullar · Accepted Answer · 2021-08-08 17:19:00Z

0

features_with_na=[features for features in dataframe.columns if dataframe[features].isnull().sum()>0]

for feature in features_with_na: print(feature, np.round(dataframe[feature].isnull().mean(), 4), '% missing values') print(features_with_na)

it will give % of missing value for each column in dataframe

answered Aug 8, 2021 at 17:19

Satish Khullar

911 silver badge2 bronze badges

Comments

cottontail · Accepted Answer · 2023-09-18 18:21:26Z

If you want to write it as a one-liner (could be useful if functions need to be called sequentially in a pipeline), then you can do so using either pipe() or passing a callable to loc[]. pipe() can be used to get the columns with NaN values as well.

df.isna().any().pipe(lambda x: x.index[x])

df.isna().any().loc[lambda x: x].index

A working example:

df = pd.DataFrame({
    'a': [1, 2, pd.NA],
    'b': [10, 20, 30],
    'c': [pd.NA, 'B', 'C']
})


df.isna().any().pipe(lambda x: x.index[x])  # Index(['a', 'c'], dtype='object')
df.isna().any().loc[lambda x: x].index      # Index(['a', 'c'], dtype='object')


df.isna().any().pipe(lambda x: df.loc[:, x])


      a     c
0     1  <NA>
1     2     B
2  <NA>     C

If you want to opposite, i.e. columns without any NaN, then notna().all() could be used instead of isna().any().

df.notna().all().pipe(lambda x: x.index[x])  # Index(['b'], dtype='object')

Michael Chao · Accepted Answer · 2022-02-08 13:18:18Z

-2

The code works if you want to find columns containing NaN values and get a list of the column names.

na_names = df.isnull().any()
list(na_names.where(na_names == True).dropna().index)

If you want to find columns whose values are all NaNs, you can replace any with all.

edited Feb 8, 2022 at 13:18

answered Jan 26, 2022 at 6:50

Michael Chao

8801 gold badge8 silver badges17 bronze badges

Collectives™ on Stack Overflow

How to find which columns contain any NaN value in Pandas dataframe

16 Answers 16

8 Comments

1 Comment

Comments

Comments

As 1-liner:

Description

Example Output

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

it will give % of missing value for each column in dataframe

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

8 Comments

1 Comment

Comments

Comments

As 1-liner:

Description

Example Output

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

it will give % of missing value for each column in dataframe

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related