11

This is really weird. I have tried several ways of dropping rows with missing data from a pandas dataframe, but none of them seem to work. This is the code (I just uncomment one of the methods used - but these are the three that I used in different modifications - this is the latest):

import pandas as pd
Test = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,'NaN',4,5],'C':[1,2,3,'NaT',5]})
print(Test)
#Test = Test.ix[Test.C.notnull()]
#Test = Test.dropna()
Test = Test[~Test[Test.columns.values].isnull()]
print "And now"
print(Test)

But in all cases, all I get is this:

   A    B    C
0  1    1    1
1  2    2    2
2  3  NaN    3
3  4    4  NaT
4  5    5    5
And now
   A    B    C
0  1    1    1
1  2    2    2
2  3  NaN    3
3  4    4  NaT
4  5    5    5

Is there any mistake that I am making? or what is the problem? Ideally, I would like to get this:

   A    B    C
0  1    1    1
1  2    2    2
4  5    5    5
2
  • 3
    Do you actually have the strings NaN and NaT instead of np.nan and np.datetime64('NaN') - as .dropna() will work correctly with the later... Commented Sep 6, 2016 at 2:57
  • the string or np.nan didn't make any difference :( Commented Sep 6, 2016 at 7:42

2 Answers 2

18

Your example DF has NaN and NaT as strings which .dropna, .notnull and co. won't consider falsey, so given your example you can use...

df[~df.isin(['NaN', 'NaT']).any(axis=1)]

Which gives you:

   A  B  C
0  1  1  1
1  2  2  2
4  5  5  5

If you had a DF such as (note of the use of np.nan and np.datetime64('NaT') instead of strings:

df = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,np.nan,4,5],'C':[1,2,3,np.datetime64('NaT'),5]})

Then running df.dropna() which give you:

   A    B  C
0  1  1.0  1
1  2  2.0  2
4  5  5.0  5

Note that column B is now a float instead of an integer as that's required to store NaN values.

Sign up to request clarification or add additional context in comments.

Comments

15

Try this on orig data:

Test.replace(["NaN", 'NaT'], np.nan, inplace = True)
Test = Test.dropna()
Test

Or Modify data and do this

import pandas as pd
import numpy as np 

Test = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,np.nan,4,5],'C':[1,2,3,pd.NaT,5]})
print(Test)
Test = Test.dropna()
print(Test)



   A    B  C
0  1  1.0  1
1  2  2.0  2
4  5  5.0  5

1 Comment

I used with Test.replace([''], np.nan, inplace = True) ,ty

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.