1

I have a Pandas Dataframe in which the columns contain list of values. Like the below.

            A                           B                           
0   ['x','x','y','y','z']           ['m','m','n','n','p']

I would like to create separate columns for each unique item in the lists and mention the count of each item under those new columns.

            A                           B                       x   y   z   m   n   p           
0   ['x','x','y','y','z']           ['m','m','n','n','p']       2   2   1   2   2   1  

Can someone help in writing the code for this?

1
  • what is the output if you have more than 1 row? Commented Dec 4, 2019 at 20:14

4 Answers 4

4

Use:

pd.concat([df,df.stack().explode().value_counts().to_frame().T],axis=1)

Output:

                 A                B  m  x  y  n  z  p
0  [x, x, y, y, z]  [m, m, n, n, p]  2  2  2  2  1  1

If you want keep the order of the list:

s=df.stack().explode()
pd.concat([df,s.value_counts().reindex(s.drop_duplicates()).to_frame().T],axis=1)

                 A                B  x  y  z  m  n  p
0  [x, x, y, y, z]  [m, m, n, n, p]  2  2  1  2  2  1

More than one row:

pd.concat([df,df.stack().explode().groupby(level=0).value_counts().unstack()],axis=1)

                 A                b    m    n    p    q    x    y    z
0  [x, x, y, y, z]  [m, m, n, n, p]  2.0  2.0  1.0  NaN  2.0  2.0  1.0
1  [y, y, y, y, z]  [p, q, n, n, p]  NaN  2.0  2.0  1.0  NaN  4.0  1.0
Sign up to request clarification or add additional context in comments.

3 Comments

If dataframe has multiple rows, this solution will sum all into one single row. I don't know whether that is OP intention because he doesn't clarify on it.
I have already proposed an alternative for that very simple case
I just pointed it out. It is not a criticism. I know your solution would easily expand to cover that case. It is perfect now. Upvoted :) +1
1

This does it for you:

df = pd.DataFrame([[0,['x','x','y','y','z'], ['m','m','n','n','p']]], columns = ['index', 'A', 'B'])

unique_vals = set([i for l in df['A'] for i in l] + [i for l in df['B'] for i in l]) # get all unique vals
for val in unique_vals:
    df[val] = df[['A', 'B']].apply(lambda row: sum([row[i].count(val) for i in row.index]), axis = 1) # count occurences across all columns for each row

Output

print(df.to_string())

   index                A                B  m  x  p  n  y  z
0      0  [x, x, y, y, z]  [m, m, n, n, p]  2  2  1  2  2  1

Comments

1

I assume your real data having more than 1 row. Therefore, I use collections.Counter and construct a new dataframe and join back

On your sample df

from collections import Counter

df_t = pd.DataFrame(df.sum(1).map(Counter).tolist())
df_final = df.join(df_t)

Out[109]:
                 A                B  x  y  z  m  n  p
0  [x, x, y, y, z]  [m, m, n, n, p]  2  2  1  2  2  1

On sample dataframe where having more than 1 row

df_more
Out[110]:
                 A                B
0  [x, x, y, y, z]  [m, m, n, n, p]
1  [y, y, y, y, z]  [p, q, n, n, p]

from collections import Counter

df_t = pd.DataFrame(df_more.sum(1).map(Counter).tolist())
df_final = df_more.join(df_t)

Out[115]:
                A                B    x  y  z    m  n  p    q
  [x, x, y, y, z]  [m, m, n, n, p]  2.0  2  1  2.0  2  1  NaN
  [y, y, y, y, z]  [p, q, n, n, p]  NaN  4  1  NaN  2  2  1.0

Comments

0

You can use functions chain.from_iterable and Counter:

from collections import Counter
from itertools import chain

df.join(df.apply(lambda x: pd.Series(Counter(chain.from_iterable(x))), axis=1))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.