1

I have a list which consists of a different colours, all stored as string variables.

Preferredcolours = ['red','yellow','green', 'blue']

I have a panda array, which contains information about cars. One of the column DfCar['colour'] consists of the colours of these cars. 
I want to create a new variable in my data frame, column named PreferredMathcing which =1 if the DataFrame colour column matches with one of the list colours. How can I use a for loop to solve this?

I would ideally want this sort of a solution:

+=================+============================+
| DfCar['colour'] | DfCar['PreferredMathcing'] |
+=================+============================+
| white           |                          0 |
+-----------------+----------------------------+
| yellow          |                          1 |
+-----------------+----------------------------+
| black           |                          0 |
+-----------------+----------------------------+
| purple          |                          0 |
+-----------------+----------------------------+
| green           |                          1 |
+-----------------+----------------------------+
4
  • By pandas array, do you mean a dataframe? Commented Jun 24, 2019 at 12:46
  • Please provide an example input as well. Commented Jun 24, 2019 at 12:48
  • 1
    df['PreferredMatching'] = df[df.colour.isin(PreferredColours)].astpye(int) Commented Jun 24, 2019 at 12:56
  • 1
    @Saif you got a lot of working solutions, if your data is big I suggest you benchmark them and choose the one that performs best... from my experience, using apply(...) for simple stuff can take x20 - x30 times more then a dedicated function. that is - half an hour instead of 1m, or a full day instead of 1h... Commented Jun 24, 2019 at 12:56

4 Answers 4

1

you can use .isin(), which returns a Series with True/False for each row based on if it is in a list of values. then use .astype(int) to get your 1/0 instead.

try this:

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict({'colour': ['white', 'yellow', 'black', 'purple', 'green']})
Preferredcolours = ['red','yellow','green', 'blue']

df["PreferredMathcing"] = df['colour'].isin(Preferredcolours).astype(int)

print(df)

output:

   colour  PreferredMathcing
0   white                  0
1  yellow                  1
2   black                  0
3  purple                  0
4   green                  1

NOTE:

choosing a solution with a pure library function will likely out-perform a solution using apply with custom python logic.

bench-marking those against each other on my machine suggests .isin() is almost x8 faster:

with '.isin()': 1.0591506958007812
with '.apply()': 8.234664678573608
ratio: 7.774780974248154

Sign up to request clarification or add additional context in comments.

Comments

1

following will give you output

def check_colour(x, Preferredcolours) :
    return 1 if x['colour'] in Preferredcolours else 0

dfCar['PreferredMathcing'] = df.apply(check_colour,args=(Preferredcolours,), axis=1)

Comments

1

You can use np.where like below:

import pandas as pd
import numpy as np

DfCar = pd.DataFrame.from_dict({'colour': ['white', 'yellow', 'black', 'purple', 'green']})
Preferredcolours = ['red','yellow','green', 'blue']

DfCar['PreferredMathcing'] = np.where(DfCar['colour'].isin(Preferredcolours), 1, 0)

Comments

0

Assuming DfCar is your Dataframe.

Preferredcolours = ['red','yellow','green', 'blue']    
DfCar['PreferredMatching'] = DfCar['colour'].apply(lambda x: x in Preferredcolours)

This will apply the lambda function over every element in your "colour" column. Simply check if it is in "preferredcolours" and return True or False.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.