0

I'm attempting to go through each row in a data frame and checking if selected row has more than 3 null values (this part works) and then deleting the entire row. However, upon trying to drop said rows from the data frame, I'm met with an error:

AttributeError: 'NoneType' object has no attribute 'index'

Forgive me if this code is inefficient, I only need it to work.

import pandas as pd

df = pd.read_csv('data/mycsv.csv')


i = 0

while i < len(df.index):
    if df.iloc[i].isnull().sum() > 3:    
        df = df.drop(df.index[i], inplace = True)
    i += 1

2 Answers 2

1

Use DataFrame.dropna with thresh, but because it is for non NaNs column need subtract length of columns:

np.random.seed(2021)

df = pd.DataFrame(np.random.choice([np.nan, 1], size=(5,6)))
print (df)
     0    1    2    3    4    5
0  NaN  1.0  1.0  NaN  1.0  NaN
1  NaN  NaN  1.0  NaN  1.0  1.0
2  1.0  1.0  NaN  NaN  NaN  NaN
3  NaN  NaN  1.0  1.0  1.0  1.0
4  NaN  1.0  NaN  1.0  NaN  NaN

N = 3
df1 = df.dropna(thresh=len(df.columns) - N)
print(df1)
    0    1    2    3    4    5
0 NaN  1.0  1.0  NaN  1.0  NaN
1 NaN  NaN  1.0  NaN  1.0  1.0
3 NaN  NaN  1.0  1.0  1.0  1.0


N = 2
df2 = df.dropna(thresh=len(df.columns) - N)
print(df2)
    0   1    2    3    4    5
3 NaN NaN  1.0  1.0  1.0  1.0

You can filter rows if equal or less like 3 NaNs in boolean indexing:

N = 3
df1 = df[df.isnull().sum(axis=1) <= N]
print (df1)
    0    1    2    3    4    5
0 NaN  1.0  1.0  NaN  1.0  NaN
1 NaN  NaN  1.0  NaN  1.0  1.0
3 NaN  NaN  1.0  1.0  1.0  1.0
Sign up to request clarification or add additional context in comments.

Comments

0

Use threshold=X as parameter of dropna where X is the number of columns (df.shape[1]) minus your threshold (3).

Suppose this dataframe

>>> df
     0    1    2    3    4    5
0  NaN  NaN  NaN  NaN  NaN  NaN  # Drop (Nan = 6)
1  NaN  NaN  NaN  NaN  NaN  1.0  # Drop (Nan = 5)
2  NaN  NaN  NaN  NaN  1.0  1.0  # Drop (Nan = 4)
3  NaN  NaN  NaN  1.0  1.0  1.0  # Keep (Nan = 3)
4  NaN  NaN  1.0  1.0  1.0  1.0  # Keep (Nan = 2)
5  NaN  1.0  1.0  1.0  1.0  1.0  # Keep (Nan = 1)
6  1.0  1.0  1.0  1.0  1.0  1.0  # Keep (Nan = 0)
df = df.dropna(thresh=df.shape[1] - 3)
print(df)

     0    1    2    3    4    5
3  NaN  NaN  NaN  1.0  1.0  1.0
4  NaN  NaN  1.0  1.0  1.0  1.0
5  NaN  1.0  1.0  1.0  1.0  1.0
6  1.0  1.0  1.0  1.0  1.0  1.0

4 Comments

??? I posted my answer since 29 minutes. You add dropna to your answer there are 24 minutes...
Sure, but correct, not wrong.
And correct your answer after 22 minutes after post?
Sorry. I have a job...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.