python pandas column based on multiple if else conditions

Question

I have 4 columns in my pandas data frame with column names lets say, A,B,C,D each mapped to a field in UI. Each has its own purpose, however users are entering the field A information in either field A or B or C or D. I am trying to clean the data and bring it to column A for analysis. So if there is any value in column A, I don't care about values in B or C or D. But if there is no value in column A, then I have to look for user entry in other columns and bring it column A. Actual values for column A will always start with some values from our list. So, if there is no data in column A, then we have to look for the value in column B and see if that has the value from our list, then bring it to A, if column B is also null or if it has some other value than values from our list, leave it and check the same in column C, similarly in column D. How to do this in python?

Please let me know if anything is unclear.

Example,

mylist = ['senior','junior','midlevel']

inputdf

 A        B      C          D
senior  male   senior     UK
        senior candidate  USA
        female junior     
junior  male   junior     AU
        male   candidate  midlevel
        female candidate  AU


Outputdf,

A           B        C         D
senior     male     senior     UK
senior     senior  candidate   USA
junior     female  junior  
junior     male    junior      AU
midlevel   male    candidate  midlevel
           female  candidate  AU

Anirudh Sridhar · Accepted Answer · 2017-06-11 13:52:55Z

1

You can use apply function to iterate through the df and return the value to the column 'A'.

def func(row):
    for index_val, series_val in row.iteritems():
        if (series_val in mylist):
            return series_val

df['A'] = df.apply(func, axis = 1)

This code checks if the value in A is present in mylist. If yes, then returns that value, else moves on and check B and then so on.

edited Jun 11, 2017 at 13:52

answered Jun 10, 2017 at 11:52

Anirudh Sridhar

1832 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

ds_user Over a year ago

Thanks. However in some cases, mylist values present multiple times, example, no value in column A, but junior in column B and junior in column C. In that case, this will write duplicate in column A. How can we avoid it from checking further columns once it finds it first time.

Anirudh Sridhar Over a year ago

Once the value is returned it does not make any duplicate comparisons. As soon as the return statement is executed the loop breaks and the function does not make any more comparisons. If you are still facing issues then you could add some more example (before and after running the code).

ds_user Over a year ago

Thanks. Got you, but I am getting different error - AttributeError: ("'Series' object has no attribute 'columns'". I think apply function passes only one column at a time to the function.

ds_user Over a year ago

Hi, thanks I sorted out the issue. I used for index_val, series_val in s.iteritems(): if series_val in mylist return series_val. Because apply function pass each row as a series with tuples in it. Please update this in your answers, I will then accept it.

Anirudh Sridhar Over a year ago

I have made the changes.

Collectives™ on Stack Overflow

python pandas column based on multiple if else conditions

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related