comparing multiple columns in dataframe (more than 2)

Question

I have a data frame

data = pd.DataFrame({'student': ['a', 'b', 'c'],
                     'rank': [2, 2, 1],
                     'rank1': [3, 3, 2],
                     'rank2': [4, 2, 3]})

my code

import numpy as np

data['Diff'] = np.where((data['rank'] != data['rank1']) &
                        (data['rank1'] != data['rank2']), '1', '0')

requirement all the ranks must be different then 1 else 0 but I am getting b also as 1

Shubham Sharma · Accepted Answer · 2021-06-05 16:31:33Z

5

We can filter the rank like columns, then use nunique along axis=1 to check for the occurrence of N unique values

r = data.filter(like='rank')
data['diff'] = r.nunique(1).eq(r.shape[1]).view('i1')

  student  rank  rank1  rank2  diff
0       a     2      3      4     1
1       b     2      3      2     0
2       c     1      2      3     1

answered Jun 5, 2021 at 16:31

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andrej Kesely · Accepted Answer · 2021-06-05 16:27:13Z

1

You can use set() and check if the lenght of the set constructed from all the column values == 3:

data["Diff"] = (
    data[["rank", "rank1", "rank2"]]
    .apply(lambda x: len(set(x)) == 3, axis=1)
    .astype(int)
)
print(data)

Prints:

  student  rank  rank1  rank2  Diff
0       a     2      3      4     1
1       b     2      3      2     0
2       c     1      2      3     1

answered Jun 5, 2021 at 16:27

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Comments

BENY · Accepted Answer · 2021-06-05 17:05:13Z

1

Let us try pd.Series.unique with let

data['new'] = data.filter(like='rank').apply(pd.Series.unique,1).str.len().eq(3).astype(int)

Out[45]: 
0    1
1    0
2    1
dtype: int64

edited Jun 5, 2021 at 17:05

answered Jun 5, 2021 at 16:36

BENY

324k22 gold badges176 silver badges250 bronze badges

1 Comment

Henry Ecker Over a year ago

diff compared to 0 would also only find consecutive duplicates no?

Collectives™ on Stack Overflow

comparing multiple columns in dataframe (more than 2)

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related