Python Pandas - Lookup a variable column depending on another column's value

Question

I'm trying to use the value of one cell to find the value of a cell in another column. The first cell value ('source') dictates which column to lookup.

import pandas as pd

df = pd.DataFrame({'A': ['John', 'Andrew', 'Bob', 'Fred'], 'B': [
                  'Fred', 'Simon', 'Andrew', 'Andrew'], 'source': ['A', 'B', 'A', 'B']}, )

print(df)

        A       B source
0    John    Fred      A
1  Andrew   Simon      B
2     Bob  Andrew      A
3    Fred  Andrew      B

My required output value in the 'output' column is a lookup of the 'source':

        A       B source  output
0    John    Fred      A    John
1  Andrew   Simon      B   Simon
2     Bob  Andrew      A     Bob
3    Fred  Andrew      B  Andrew

Failed attempts

df['output'] = df[df['source']]

This results in a ValueError: Wrong number of items passed 4, placement implies 1 because the df['source'] passes in a Series, not a string. I tried converting to a string using:

df['output'] = df[df['source'].convertDTypes(convert_string=True)]

which gave error AttributeError: 'Series' object has no attribute 'convertDTypes'.

Working solution

I found a solution might by iterating through the rows using:

for index, row in df.iterrows():
    column = df.loc[index, 'source']
    df.at[index, 'output'] = df.loc[index, column]

However, this post suggests iterating is a bad idea. The code doesn't seem very elegant, either.

I feel I've missed something basic here; this really should not be that hard.

Does this answer your question? Pandas - select column using other column value as column name — Leif Metcalf
– Leif Metcalf, Commented Jan 26, 2022 at 2:01

BENY · Accepted Answer · 2021-03-20 18:56:37Z

8

Let us do numpy way since lookup will not longer work in the future version

df['new'] = df.values[df.index,df.columns.get_indexer(df.source)]
df
Out[339]: 
        A       B source     new
0    John    Fred      A    John
1  Andrew   Simon      B   Simon
2     Bob  Andrew      A     Bob
3    Fred  Andrew      B  Andrew

answered Mar 20, 2021 at 18:56

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ITSM-79 Over a year ago

Thank you! This is the easiest solution so far: very easy to read. Thanks.

Vishnudev Krishnadas · Accepted Answer · 2021-03-20 19:00:02Z

4

Use numpy.where

df['output'] = np.where(df.source == 'A', df.A, df.B)

If you have more columns, use numpy.select

conditions = [df.source == 'A', df.source == 'B']
values = [df.A, df.B]
df['output'] = np.select(conditions, values)

edited Mar 20, 2021 at 19:00

answered Mar 20, 2021 at 18:54

Vishnudev Krishnadas

11k2 gold badges29 silver badges58 bronze badges

1 Comment

ITSM-79 Over a year ago

Thanks. In my actual problem, the 'source' column has at least eight different values which could all reference a different column. Using any kind of hard referencing to 'A' or 'B' will not work. The solution needs to go from reading the 'source' to looking up the correct column, and the finding the value in the same row as the 'source' to generate the 'output'.

ashkangh · Accepted Answer · 2021-03-20 18:57:25Z

2

Try this:

df['output'] = df.apply(lambda x: x[x.source], axis=1)

Output:

    A         B source  output
0   John    Fred    A   John
1   Andrew  Simon   B   Simon
2   Bob     Andrew  A   Bob
3   Fred    Andrew  B   Andrew

answered Mar 20, 2021 at 18:57

ashkangh

1,6241 gold badge8 silver badges11 bronze badges

1 Comment

ITSM-79 Over a year ago

Love this solution for my simple problem. It's noted elsewhere that using 'apply' is slow, but this works here and is way better than my iterrows solution.

anky · Accepted Answer · 2021-03-20 20:52:52Z

2

Stack and then loc with multiindex for recent versions:

df['output'] = df.stack().loc[zip(df.index,df['source'])].droplevel(-1)

or:

df['output'] = (df.stack().loc[pd.MultiIndex.from_arrays((df.index,df['source']))]
                .droplevel(1))

For earlier versions of pandas:

df['output'] =  df.lookup(df.index,df['source'])

        A       B source  output
0    John    Fred      A    John
1  Andrew   Simon      B   Simon
2     Bob  Andrew      A     Bob
3    Fred  Andrew      B  Andrew

edited Mar 20, 2021 at 20:52

answered Mar 20, 2021 at 18:54

anky

75.3k11 gold badges46 silver badges76 bronze badges

3 Comments

ITSM-79 Over a year ago

Well, I'm going back to the drawing board to figure out this one. It works. and worked on my full size working DataFrame, but it will take me a while to figure out all the difference elements as I have never used stack or MultiIndex. Thanks for your help!

anky Over a year ago

@ITSM-79 it would be easy if you print df.stack() and pd.MultiIndex.from_arrays((df.index,df['source'])) seperately, then its just matching the indexes and dropping the extra index :)

anky Over a year ago

@ITSM-79 Added another method with zip

Anurag Dabas · Accepted Answer · 2021-04-17 14:39:05Z

0

You can also do this simply by enumerate() ,list comprehension and loc[] accessor

df['output']=[df.loc[x,y] for x,y in enumerate(df['source'])]

Now If you print df you will get your desired output:

      A     B       source      output
0   John    Fred        A       John
1   Andrew  Simon       B       Simon
2   Bob     Andrew      A       Bob
3   Fred    Andrew      B       Andrew

answered Apr 17, 2021 at 14:39

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

Collectives™ on Stack Overflow

Python Pandas - Lookup a variable column depending on another column's value

5 Answers 5

1 Comment

1 Comment

1 Comment

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

1 Comment

1 Comment

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related