python pandas - display first cell in rows with NULL values in selected column

Question

Suppose I have this csv data:

id_column   col_name2   col_name3   
id1         value1      value1      
id2         value2                  
id3                     value2      
id4         value3                         

#User selects number 3 (related to col_name3), I do

df = pandas.read_csv("file.csv")
col=df.columns[3]
df_col = pandas.read_csv("file.csv", usecols=[col])

#print(df_col.isnull())
#maybe iterate through df_col values to catch NULL values 
#print only id2 and id4

How to display just id2 and id4, related to the NULL cells on col_name3?

I let the user to select the column, and if for instance, the user selected col_name3 like above, I want to automatically display the id(s) in id_column where NULL values exist in selected col_name3.

So, if the user choose col_name3, ONLY id2 an id4 should be displayed. If the user chose col_name2, ONLY id3 should be displayed.

hi! Is any one of the answers below working? If so & if you wish, you might consider accepting one of them to signal others that the issue is resolved. If not, you can provide feedback so they can be improved (or removed altogether) — Anurag Dabas
– Anurag Dabas, Commented Aug 14, 2021 at 7:57

OmerM25 · Accepted Answer · 2021-06-27 14:21:27Z

2

If I understand you correctly, what you want to do should look like this:

df = pd.DataFrame({'name': ['id1', 'id2', 'id3'], 'a': [20, None, 30], 'b': [10, 40, None]})

df[df.isna().any(axis=1)].iloc[:, 0]

will result in: df2, df3.

Explanation:

The df.isna() will bring us all the nulls. .any(axis=1) will bring where there's at least 1 null (opposite to .all()) from the columns axis.

And finally the .iloc[:, 0] will give us the first column, this is not a must, only if you want the first column (remove if you want all the columns where there's at least one null).

Edit to answer your edits:

In order to select the column by the user, we will add input:

chosen_column = input(f"Please choose one of the following columns: {list(df.columns)}")

# Filter by na and display only the chosen column
df[df.isna().any(axis=1)][chosen_column]

I hope I understood you correctly and this is what you were aiming for.

edited Jun 27, 2021 at 14:21

answered Jun 27, 2021 at 14:09

OmerM25

2532 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

JON Over a year ago

Hi, thanks I will try this. I want the user to select the column and if, for instance, the user selected col_name3, I want to automatically display the id(s) in first column where NULL values exist in col_name3.

OmerM25 Over a year ago

@seneca I edited the posts with your edits, let me know if this is what you were aiming for.

JON Over a year ago

Sorry I had an issue with id(s) in first column. The idea is to display only id2 and id4 if the user selects 3rd column, or only id3 if user selects 2nd column.

Anurag Dabas · Accepted Answer · 2021-06-27 17:12:32Z

1

You can make a custom function for that purpose:

def print_id(col,df=df):
    df=df.copy()
    if isinstance(col,list):
        return df.loc[df[col].isna().any(1),'id_column'].reset_index(drop=True)
    else:
        return df.loc[df[col].isna(),'id_column'].reset_index(drop=True)

Finally call that function(input given by user):

print_id('col_name3')
#OR
print_id('col_name3',df)
#OR
print_id(['col_name3','col_name2'])
#OR
print_id(['col_name3','col_name2'],df)

OR

If you want to enter 2 and it selects col_name2 then use:

def print_id(like,df=df):
    if isinstance(like,list):
        print('like parameter doesn\'t support multiple values')
        return None
    else:
        df=df.copy()
        return df.loc[df.filter(like=str(like)).isna().any(1),'id_column'].reset_index(drop=True)

Finally call that function(input given by user):

print_id(2)
#OR
print_id(3,df)

edited Jun 27, 2021 at 17:12

answered Jun 27, 2021 at 14:20

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

3 Comments

JON Over a year ago

Hi, when I call the function print(print_id(col)) it returns all columns and rows. Not useful in my case.

JON Over a year ago

Sorry, I had an issue with indexing in first column, I fixed it now, maybe its more clear

Anurag Dabas Over a year ago

@seneca updated answer...kindly have a look :)

Collectives™ on Stack Overflow

python pandas - display first cell in rows with NULL values in selected column

2 Answers 2

3 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related