Dropping rows with pandas data frame when multiple Null values exist

Question

I'm attempting to go through each row in a data frame and checking if selected row has more than 3 null values (this part works) and then deleting the entire row. However, upon trying to drop said rows from the data frame, I'm met with an error:

AttributeError: 'NoneType' object has no attribute 'index'

Forgive me if this code is inefficient, I only need it to work.

import pandas as pd

df = pd.read_csv('data/mycsv.csv')


i = 0

while i < len(df.index):
    if df.iloc[i].isnull().sum() > 3:    
        df = df.drop(df.index[i], inplace = True)
    i += 1

jezrael · Accepted Answer · 2021-11-23 14:32:18Z

Use DataFrame.dropna with thresh, but because it is for non NaNs column need subtract length of columns:

np.random.seed(2021)

df = pd.DataFrame(np.random.choice([np.nan, 1], size=(5,6)))
print (df)
     0    1    2    3    4    5
0  NaN  1.0  1.0  NaN  1.0  NaN
1  NaN  NaN  1.0  NaN  1.0  1.0
2  1.0  1.0  NaN  NaN  NaN  NaN
3  NaN  NaN  1.0  1.0  1.0  1.0
4  NaN  1.0  NaN  1.0  NaN  NaN

N = 3
df1 = df.dropna(thresh=len(df.columns) - N)
print(df1)
    0    1    2    3    4    5
0 NaN  1.0  1.0  NaN  1.0  NaN
1 NaN  NaN  1.0  NaN  1.0  1.0
3 NaN  NaN  1.0  1.0  1.0  1.0


N = 2
df2 = df.dropna(thresh=len(df.columns) - N)
print(df2)
    0   1    2    3    4    5
3 NaN NaN  1.0  1.0  1.0  1.0

You can filter rows if equal or less like 3 NaNs in boolean indexing:

N = 3
df1 = df[df.isnull().sum(axis=1) <= N]
print (df1)
    0    1    2    3    4    5
0 NaN  1.0  1.0  NaN  1.0  NaN
1 NaN  NaN  1.0  NaN  1.0  1.0
3 NaN  NaN  1.0  1.0  1.0  1.0

Corralien · Accepted Answer · 2021-11-23 14:50:08Z

0

Use threshold=X as parameter of dropna where X is the number of columns (df.shape[1]) minus your threshold (3).

Suppose this dataframe

>>> df
     0    1    2    3    4    5
0  NaN  NaN  NaN  NaN  NaN  NaN  # Drop (Nan = 6)
1  NaN  NaN  NaN  NaN  NaN  1.0  # Drop (Nan = 5)
2  NaN  NaN  NaN  NaN  1.0  1.0  # Drop (Nan = 4)
3  NaN  NaN  NaN  1.0  1.0  1.0  # Keep (Nan = 3)
4  NaN  NaN  1.0  1.0  1.0  1.0  # Keep (Nan = 2)
5  NaN  1.0  1.0  1.0  1.0  1.0  # Keep (Nan = 1)
6  1.0  1.0  1.0  1.0  1.0  1.0  # Keep (Nan = 0)

df = df.dropna(thresh=df.shape[1] - 3)
print(df)

     0    1    2    3    4    5
3  NaN  NaN  NaN  1.0  1.0  1.0
4  NaN  NaN  1.0  1.0  1.0  1.0
5  NaN  1.0  1.0  1.0  1.0  1.0
6  1.0  1.0  1.0  1.0  1.0  1.0

edited Nov 23, 2021 at 14:50

answered Nov 23, 2021 at 14:27

Corralien

121k8 gold badges43 silver badges68 bronze badges

4 Comments

Corralien Over a year ago

??? I posted my answer since 29 minutes. You add dropna to your answer there are 24 minutes...

jezrael Over a year ago

Sure, but correct, not wrong.

jezrael Over a year ago

And correct your answer after 22 minutes after post?

Corralien Over a year ago

Sorry. I have a job...

Collectives™ on Stack Overflow

Dropping rows with pandas data frame when multiple Null values exist

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related