Pandas DataFrame : Create a column based on values from different rows

Question

I have a pandas dataframe which looks like this :

    Ref       Value
1   SKU1       A
2   SKU2       A           
3   SKU3       B
4   SKU2       A
5   SKU1       B
6   SKU3       C

I would like to create a new column, conditioned on whether the values for a given Ref match or not. For instance, if for SKU1 both rows have the same values, display "good", if not display "bad" The dataframe will usually have 2 rows for each Ref, but sometimes will have more (in that case, "good" is when they all match with each other)

With the example above, this would be :

    Ref       Value    NewCol
1   SKU1       A        bad
2   SKU2       A        good   
3   SKU3       B        bad
4   SKU2       A        good  
5   SKU1       B        bad
6   SKU3       C        bad

What would be the best way of implementing this ? In my example, Value can only be A, B or C, but Ref has thousands of different entries, which is why I am struggling

Many thanks in advance !

SKU1 has values A and B => bad, because different rows of SKU1 have different values (inconsistent) ; SKU2 has twice the value A => good, because different rows for SKU2 have a consistent value — Sash9
– Sash9, Commented Oct 14, 2020 at 0:27

Quang Hoang · Accepted Answer · 2020-10-14 00:46:08Z

3

Let's try groupby().nunique() to check the number of values within a ref:

df['NewCol'] = np.where(df.groupby('Ref')['Value'].transform('nunique')==1, 
                        'good', 'bad')

Output:

    Ref Value NewCol
1  SKU1     A    bad
2  SKU2     A   good
3  SKU3     B    bad
4  SKU2     A   good
5  SKU1     B    bad
6  SKU3     C    bad

Update: per comment:

s = df['Ref'].map(df.groupby('Ref')['Value'].apply(set))

df['NewCol'] = np.select((s.str.len()==1, s.eq({'A','B'})),
                         ('good', 'average'), 'bad')

Output:

    Ref Value   NewCol
1  SKU1     A  average
2  SKU2     A     good
3  SKU3     B      bad
4  SKU2     A     good
5  SKU1     B  average
6  SKU3     C      bad

edited Oct 14, 2020 at 0:46

answered Oct 14, 2020 at 0:25

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sash9 Over a year ago

sounds like the function I was looking for, thanks ! one more question : let's assume I want to get more precise, and instead of just having "bad" when the values don't match, I would like to display "bad" for certain combination of values (eg. A & B) and "average" for other combinations (eg. B & C or A & C). Matching values would still display "good" as before. How would I need to adapt this ?

Sash9 Over a year ago

thanks. in the sample I gave, there is only value "C" and it's "bad" so this works, but can I somehow explicitely define which combination of values is average, and which combination is bad ? assuming that good is always when all values match.

Collectives™ on Stack Overflow

Pandas DataFrame : Create a column based on values from different rows

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related