0

I have a pandas DataFrame

>>> import pandas as pd
>>> df = pd.DataFrame([['a', 2, 3], ['a,b', 5, 6], ['c', 8, 9]])
     0  1  2
0    a  2  3
1  a,b  5  6
2    c  8  9

I want to spread the first column to n columns (where n is the number of unique, comma-separated values, in this case 3). Each of the resulting columns shall be 1 if the value is present, and 0 else. Expected result is:

   1  2  a  c  b
0  2  3  1  0  0
1  5  6  1  0  1
2  8  9  0  1  0

I came up with the following code, but it seems a bit circuitous to me.

>>> import re
>>> dfSpread = pd.get_dummies(df[0].str.split(',', expand=True)).\
        rename(columns=lambda x: re.sub('.*_','',x))
>>> pd.concat([df.iloc[:,1:], dfSpread], axis = 1)

Is there a built-in function that does just that that I wasn't able to find?

2 Answers 2

4

Using get_dummies

df.set_index([1,2])[0].str.get_dummies(',').reset_index()
Out[229]: 
   1  2  a  b  c
0  2  3  1  0  0
1  5  6  1  1  0
2  8  9  0  0  1
Sign up to request clarification or add additional context in comments.

1 Comment

At least I picked the right function... You're essentially setting all other columns as index to 'save' the information, applying the function and setting the index back. That's a great thought-provoking impulse. Thanks ! (waiting 8 more minutes to accept your anser)
2

You can use pop + concat here for an alternative version of Wen's answer.

pd.concat([df, df.pop(df.columns[0]).str.get_dummies(sep=',')], axis=1)

   1  2  a  b  c
0  2  3  1  0  0
1  5  6  1  1  0
2  8  9  0  0  1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.