Pandas - dropping rows with missing data not working using .isnull(), notnull(), dropna()

Question

This is really weird. I have tried several ways of dropping rows with missing data from a pandas dataframe, but none of them seem to work. This is the code (I just uncomment one of the methods used - but these are the three that I used in different modifications - this is the latest):

import pandas as pd
Test = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,'NaN',4,5],'C':[1,2,3,'NaT',5]})
print(Test)
#Test = Test.ix[Test.C.notnull()]
#Test = Test.dropna()
Test = Test[~Test[Test.columns.values].isnull()]
print "And now"
print(Test)

But in all cases, all I get is this:

   A    B    C
0  1    1    1
1  2    2    2
2  3  NaN    3
3  4    4  NaT
4  5    5    5
And now
   A    B    C
0  1    1    1
1  2    2    2
2  3  NaN    3
3  4    4  NaT
4  5    5    5

Is there any mistake that I am making? or what is the problem? Ideally, I would like to get this:

   A    B    C
0  1    1    1
1  2    2    2
4  5    5    5

Do you actually have the strings NaN and NaT instead of np.nan and np.datetime64('NaN') - as .dropna() will work correctly with the later... — Jon Clements
– Jon Clements, Commented Sep 6, 2016 at 2:57

Jon Clements · Accepted Answer · 2016-09-06 03:02:02Z

18

Your example DF has NaN and NaT as strings which .dropna, .notnull and co. won't consider falsey, so given your example you can use...

df[~df.isin(['NaN', 'NaT']).any(axis=1)]

Which gives you:

If you had a DF such as (note of the use of np.nan and np.datetime64('NaT') instead of strings:

df = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,np.nan,4,5],'C':[1,2,3,np.datetime64('NaT'),5]})

Then running df.dropna() which give you:

   A    B  C
0  1  1.0  1
1  2  2.0  2
4  5  5.0  5

Note that column B is now a float instead of an integer as that's required to store NaN values.

answered Sep 6, 2016 at 3:02

Jon Clements

143k34 gold badges254 silver badges288 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Merlin · Accepted Answer · 2019-02-11 16:06:28Z

15

Try this on orig data:

Test.replace(["NaN", 'NaT'], np.nan, inplace = True)
Test = Test.dropna()
Test

Or Modify data and do this

import pandas as pd
import numpy as np 

Test = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,np.nan,4,5],'C':[1,2,3,pd.NaT,5]})
print(Test)
Test = Test.dropna()
print(Test)



   A    B  C
0  1  1.0  1
1  2  2.0  2
4  5  5.0  5

edited Feb 11, 2019 at 16:06

answered Sep 6, 2016 at 3:17

Merlin

25.9k44 gold badges141 silver badges213 bronze badges

1 Comment

FourZeroFive Over a year ago

I used with Test.replace([''], np.nan, inplace = True) ,ty

Collectives™ on Stack Overflow

Pandas - dropping rows with missing data not working using .isnull(), notnull(), dropna()

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related