Pandas List of Values to Binary Columns

Question

I have a dataframe of users who purchased various items. I want to breakout that list of values into separate columns and have a binary flag for users who purchased that item.

Input:

       A           B
0  James  [123, 456]
1   Mary       [123]
2   John  [456, 789]

Expected Output:

       A           B  123  456  789
0  James  [123, 456]    1    1    0
1   Mary       [123]    1    0    0
2   John  [456, 789]    0    1    1

What I've tried (step by step)

df['B'].explode() is my first step:

The using get_dummies() pd.get_dummies(df['B'].explode()):

   123  456  789
0    1    0    0
0    0    1    0
1    1    0    0
2    0    1    0
2    0    0    1

Join it together on index df.join(pd.get_dummies(df['B'].explode())):

       A           B  123  456  789
0  James  [123, 456]    1    0    0
0  James  [123, 456]    0    1    0
1   Mary       [123]    1    0    0
2   John  [456, 789]    0    1    0
2   John  [456, 789]    0    0    1

Problem:

Now I just need to groupby and combine. However, with millions and millions of rows and customers buying 100s of products, this method of joining/combining is highly inefficient. Is there a more "pandas-friendly" or built in function that does this?

for performance you can also try using this solution using MultiLabelBinarizer — anky
– anky, Commented Apr 7, 2020 at 14:19

Quang Hoang · Accepted Answer · 2020-04-07 14:03:59Z

7

you can replace pd.get_dummies(df['B'].explode() with pd.get_dummies(df.B.explode()).sum(level=0) and join.

answered Apr 7, 2020 at 14:03

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas List of Values to Binary Columns

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related