1

Suppose I have this csv data:

id_column   col_name2   col_name3   
id1         value1      value1      
id2         value2                  
id3                     value2      
id4         value3                         

#User selects number 3 (related to col_name3), I do

df = pandas.read_csv("file.csv")
col=df.columns[3]
df_col = pandas.read_csv("file.csv", usecols=[col])

#print(df_col.isnull())
#maybe iterate through df_col values to catch NULL values 
#print only id2 and id4

How to display just id2 and id4, related to the NULL cells on col_name3?

I let the user to select the column, and if for instance, the user selected col_name3 like above, I want to automatically display the id(s) in id_column where NULL values exist in selected col_name3.

So, if the user choose col_name3, ONLY id2 an id4 should be displayed. If the user chose col_name2, ONLY id3 should be displayed.

1
  • hi! Is any one of the answers below working? If so & if you wish, you might consider accepting one of them to signal others that the issue is resolved. If not, you can provide feedback so they can be improved (or removed altogether) Commented Aug 14, 2021 at 7:57

2 Answers 2

2

If I understand you correctly, what you want to do should look like this:

df = pd.DataFrame({'name': ['id1', 'id2', 'id3'], 'a': [20, None, 30], 'b': [10, 40, None]})

df[df.isna().any(axis=1)].iloc[:, 0]

will result in: df2, df3.

Explanation:

The df.isna() will bring us all the nulls. .any(axis=1) will bring where there's at least 1 null (opposite to .all()) from the columns axis.

And finally the .iloc[:, 0] will give us the first column, this is not a must, only if you want the first column (remove if you want all the columns where there's at least one null).

Edit to answer your edits:

In order to select the column by the user, we will add input:

chosen_column = input(f"Please choose one of the following columns: {list(df.columns)}")

# Filter by na and display only the chosen column
df[df.isna().any(axis=1)][chosen_column]

I hope I understood you correctly and this is what you were aiming for.

Sign up to request clarification or add additional context in comments.

3 Comments

Hi, thanks I will try this. I want the user to select the column and if, for instance, the user selected col_name3, I want to automatically display the id(s) in first column where NULL values exist in col_name3.
@seneca I edited the posts with your edits, let me know if this is what you were aiming for.
Sorry I had an issue with id(s) in first column. The idea is to display only id2 and id4 if the user selects 3rd column, or only id3 if user selects 2nd column.
1

You can make a custom function for that purpose:

def print_id(col,df=df):
    df=df.copy()
    if isinstance(col,list):
        return df.loc[df[col].isna().any(1),'id_column'].reset_index(drop=True)
    else:
        return df.loc[df[col].isna(),'id_column'].reset_index(drop=True)

Finally call that function(input given by user):

print_id('col_name3')
#OR
print_id('col_name3',df)
#OR
print_id(['col_name3','col_name2'])
#OR
print_id(['col_name3','col_name2'],df)

OR

If you want to enter 2 and it selects col_name2 then use:

def print_id(like,df=df):
    if isinstance(like,list):
        print('like parameter doesn\'t support multiple values')
        return None
    else:
        df=df.copy()
        return df.loc[df.filter(like=str(like)).isna().any(1),'id_column'].reset_index(drop=True)

Finally call that function(input given by user):

print_id(2)
#OR
print_id(3,df)

3 Comments

Hi, when I call the function print(print_id(col)) it returns all columns and rows. Not useful in my case.
Sorry, I had an issue with indexing in first column, I fixed it now, maybe its more clear
@seneca updated answer...kindly have a look :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.