7

I'm trying to use the value of one cell to find the value of a cell in another column. The first cell value ('source') dictates which column to lookup.

import pandas as pd

df = pd.DataFrame({'A': ['John', 'Andrew', 'Bob', 'Fred'], 'B': [
                  'Fred', 'Simon', 'Andrew', 'Andrew'], 'source': ['A', 'B', 'A', 'B']}, )

print(df)

        A       B source
0    John    Fred      A
1  Andrew   Simon      B
2     Bob  Andrew      A
3    Fred  Andrew      B

My required output value in the 'output' column is a lookup of the 'source':

        A       B source  output
0    John    Fred      A    John
1  Andrew   Simon      B   Simon
2     Bob  Andrew      A     Bob
3    Fred  Andrew      B  Andrew

Failed attempts

df['output'] = df[df['source']]

This results in a ValueError: Wrong number of items passed 4, placement implies 1 because the df['source'] passes in a Series, not a string. I tried converting to a string using:

df['output'] = df[df['source'].convertDTypes(convert_string=True)]

which gave error AttributeError: 'Series' object has no attribute 'convertDTypes'.

Working solution

I found a solution might by iterating through the rows using:

for index, row in df.iterrows():
    column = df.loc[index, 'source']
    df.at[index, 'output'] = df.loc[index, column]

However, this post suggests iterating is a bad idea. The code doesn't seem very elegant, either.

I feel I've missed something basic here; this really should not be that hard.

1

5 Answers 5

8

Let us do numpy way since lookup will not longer work in the future version

df['new'] = df.values[df.index,df.columns.get_indexer(df.source)]
df
Out[339]: 
        A       B source     new
0    John    Fred      A    John
1  Andrew   Simon      B   Simon
2     Bob  Andrew      A     Bob
3    Fred  Andrew      B  Andrew
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! This is the easiest solution so far: very easy to read. Thanks.
4

Use numpy.where

df['output'] = np.where(df.source == 'A', df.A, df.B)

If you have more columns, use numpy.select

conditions = [df.source == 'A', df.source == 'B']
values = [df.A, df.B]
df['output'] = np.select(conditions, values)

1 Comment

Thanks. In my actual problem, the 'source' column has at least eight different values which could all reference a different column. Using any kind of hard referencing to 'A' or 'B' will not work. The solution needs to go from reading the 'source' to looking up the correct column, and the finding the value in the same row as the 'source' to generate the 'output'.
2

Try this:

df['output'] = df.apply(lambda x: x[x.source], axis=1)

Output:

    A         B source  output
0   John    Fred    A   John
1   Andrew  Simon   B   Simon
2   Bob     Andrew  A   Bob
3   Fred    Andrew  B   Andrew

1 Comment

Love this solution for my simple problem. It's noted elsewhere that using 'apply' is slow, but this works here and is way better than my iterrows solution.
2

Stack and then loc with multiindex for recent versions:

df['output'] = df.stack().loc[zip(df.index,df['source'])].droplevel(-1)

or:

df['output'] = (df.stack().loc[pd.MultiIndex.from_arrays((df.index,df['source']))]
                .droplevel(1))

For earlier versions of pandas:

df['output'] =  df.lookup(df.index,df['source'])

        A       B source  output
0    John    Fred      A    John
1  Andrew   Simon      B   Simon
2     Bob  Andrew      A     Bob
3    Fred  Andrew      B  Andrew

3 Comments

Well, I'm going back to the drawing board to figure out this one. It works. and worked on my full size working DataFrame, but it will take me a while to figure out all the difference elements as I have never used stack or MultiIndex. Thanks for your help!
@ITSM-79 it would be easy if you print df.stack() and pd.MultiIndex.from_arrays((df.index,df['source'])) seperately, then its just matching the indexes and dropping the extra index :)
@ITSM-79 Added another method with zip
0

You can also do this simply by enumerate() ,list comprehension and loc[] accessor

df['output']=[df.loc[x,y] for x,y in enumerate(df['source'])]

Now If you print df you will get your desired output:

      A     B       source      output
0   John    Fred        A       John
1   Andrew  Simon       B       Simon
2   Bob     Andrew      A       Bob
3   Fred    Andrew      B       Andrew

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.