3

I have a dataframe like below

    name    item 
0   Jack    A
1   Sarah   B
2   Ross    A
3   Sean    C
4   Jack    C
5   Ross    B

What I like to do is to produce a dictionary that connects people to the products they are related to.

{Jack: [1, 0, 1], Sarah: [0, 1, 0], Ross:[1, 1, 0], Sean:[0, 0, 1]}

I feel like this should be done fairly easily using pandas.groupby

I have tried looping through the dataframe, but I have >1E7 entries, and looping does not look very efficient.

1 Answer 1

4

Check with crosstab and to_dict

pd.crosstab(df.item,df.name).to_dict('l')
{'Jack': [1, 0, 1], 'Ross': [1, 1, 0], 'Sarah': [0, 1, 0], 'Sean': [0, 0, 1]}

Another interesting option is using str.get_dummies:

# if you need counts 
df.set_index('item')['name'].str.get_dummies().sum(level=0).to_dict('l')
# if you want to record boolean indicators 
df.set_index('item')['name'].str.get_dummies().max(level=0).to_dict('l')
# {'Jack': [1, 0, 1], 'Ross': [1, 1, 0], 'Sarah': [0, 1, 0], 'Sean': [0, 0, 1]}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.