0

I have a pandas dataframe which looks like this :

    Ref       Value
1   SKU1       A
2   SKU2       A           
3   SKU3       B
4   SKU2       A
5   SKU1       B
6   SKU3       C           

I would like to create a new column, conditioned on whether the values for a given Ref match or not. For instance, if for SKU1 both rows have the same values, display "good", if not display "bad" The dataframe will usually have 2 rows for each Ref, but sometimes will have more (in that case, "good" is when they all match with each other)

With the example above, this would be :

    Ref       Value    NewCol
1   SKU1       A        bad
2   SKU2       A        good   
3   SKU3       B        bad
4   SKU2       A        good  
5   SKU1       B        bad
6   SKU3       C        bad        

What would be the best way of implementing this ? In my example, Value can only be A, B or C, but Ref has thousands of different entries, which is why I am struggling

Many thanks in advance !

2
  • SKU1 have value A and B why good and bad Commented Oct 14, 2020 at 0:24
  • SKU1 has values A and B => bad, because different rows of SKU1 have different values (inconsistent) ; SKU2 has twice the value A => good, because different rows for SKU2 have a consistent value Commented Oct 14, 2020 at 0:27

1 Answer 1

3

Let's try groupby().nunique() to check the number of values within a ref:

df['NewCol'] = np.where(df.groupby('Ref')['Value'].transform('nunique')==1, 
                        'good', 'bad')

Output:

    Ref Value NewCol
1  SKU1     A    bad
2  SKU2     A   good
3  SKU3     B    bad
4  SKU2     A   good
5  SKU1     B    bad
6  SKU3     C    bad

Update: per comment:

s = df['Ref'].map(df.groupby('Ref')['Value'].apply(set))

df['NewCol'] = np.select((s.str.len()==1, s.eq({'A','B'})),
                         ('good', 'average'), 'bad')

Output:

    Ref Value   NewCol
1  SKU1     A  average
2  SKU2     A     good
3  SKU3     B      bad
4  SKU2     A     good
5  SKU1     B  average
6  SKU3     C      bad
Sign up to request clarification or add additional context in comments.

2 Comments

sounds like the function I was looking for, thanks ! one more question : let's assume I want to get more precise, and instead of just having "bad" when the values don't match, I would like to display "bad" for certain combination of values (eg. A & B) and "average" for other combinations (eg. B & C or A & C). Matching values would still display "good" as before. How would I need to adapt this ?
thanks. in the sample I gave, there is only value "C" and it's "bad" so this works, but can I somehow explicitely define which combination of values is average, and which combination is bad ? assuming that good is always when all values match.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.