pandas dataframe columns with list values

Question

I have a Pandas Dataframe in which the columns contain list of values. Like the below.

            A                           B                           
0   ['x','x','y','y','z']           ['m','m','n','n','p']

I would like to create separate columns for each unique item in the lists and mention the count of each item under those new columns.

            A                           B                       x   y   z   m   n   p           
0   ['x','x','y','y','z']           ['m','m','n','n','p']       2   2   1   2   2   1

Can someone help in writing the code for this?

what is the output if you have more than 1 row?

Andy L.
– Andy L.

2019-12-04 20:14:10 +00:00
Commented Dec 4, 2019 at 20:14 — Andy L.
– Andy L., Commented Dec 4, 2019 at 20:14

ansev · Accepted Answer · 2019-12-04 23:54:17Z

4

Use:

pd.concat([df,df.stack().explode().value_counts().to_frame().T],axis=1)

Output:

                 A                B  m  x  y  n  z  p
0  [x, x, y, y, z]  [m, m, n, n, p]  2  2  2  2  1  1

If you want keep the order of the list:

s=df.stack().explode()
pd.concat([df,s.value_counts().reindex(s.drop_duplicates()).to_frame().T],axis=1)

                 A                B  x  y  z  m  n  p
0  [x, x, y, y, z]  [m, m, n, n, p]  2  2  1  2  2  1

More than one row:

pd.concat([df,df.stack().explode().groupby(level=0).value_counts().unstack()],axis=1)

                 A                b    m    n    p    q    x    y    z
0  [x, x, y, y, z]  [m, m, n, n, p]  2.0  2.0  1.0  NaN  2.0  2.0  1.0
1  [y, y, y, y, z]  [p, q, n, n, p]  NaN  2.0  2.0  1.0  NaN  4.0  1.0

edited Dec 4, 2019 at 23:54

answered Dec 4, 2019 at 19:22

ansev

31k5 gold badges21 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Andy L. Over a year ago

If dataframe has multiple rows, this solution will sum all into one single row. I don't know whether that is OP intention because he doesn't clarify on it.

ansev Over a year ago

I have already proposed an alternative for that very simple case

Andy L. Over a year ago

I just pointed it out. It is not a criticism. I know your solution would easily expand to cover that case. It is perfect now. Upvoted :) +1

Brandon · Accepted Answer · 2019-12-04 19:20:29Z

1

This does it for you:

df = pd.DataFrame([[0,['x','x','y','y','z'], ['m','m','n','n','p']]], columns = ['index', 'A', 'B'])

unique_vals = set([i for l in df['A'] for i in l] + [i for l in df['B'] for i in l]) # get all unique vals
for val in unique_vals:
    df[val] = df[['A', 'B']].apply(lambda row: sum([row[i].count(val) for i in row.index]), axis = 1) # count occurences across all columns for each row

Output

print(df.to_string())

   index                A                B  m  x  p  n  y  z
0      0  [x, x, y, y, z]  [m, m, n, n, p]  2  2  1  2  2  1

answered Dec 4, 2019 at 19:20

Brandon

1,0187 silver badges14 bronze badges

Comments

Andy L. · Accepted Answer · 2019-12-04 23:23:18Z

I assume your real data having more than 1 row. Therefore, I use collections.Counter and construct a new dataframe and join back

On your sample df

from collections import Counter

df_t = pd.DataFrame(df.sum(1).map(Counter).tolist())
df_final = df.join(df_t)

Out[109]:
                 A                B  x  y  z  m  n  p
0  [x, x, y, y, z]  [m, m, n, n, p]  2  2  1  2  2  1

On sample dataframe where having more than 1 row

df_more
Out[110]:
                 A                B
0  [x, x, y, y, z]  [m, m, n, n, p]
1  [y, y, y, y, z]  [p, q, n, n, p]

from collections import Counter

df_t = pd.DataFrame(df_more.sum(1).map(Counter).tolist())
df_final = df_more.join(df_t)

Out[115]:
                A                B    x  y  z    m  n  p    q
  [x, x, y, y, z]  [m, m, n, n, p]  2.0  2  1  2.0  2  1  NaN
  [y, y, y, y, z]  [p, q, n, n, p]  NaN  4  1  NaN  2  2  1.0

Mykola Zotko · Accepted Answer · 2019-12-04 20:26:11Z

0

You can use functions chain.from_iterable and Counter:

from collections import Counter
from itertools import chain

df.join(df.apply(lambda x: pd.Series(Counter(chain.from_iterable(x))), axis=1))

answered Dec 4, 2019 at 20:26

Mykola Zotko

18.2k6 gold badges88 silver badges90 bronze badges

Collectives™ on Stack Overflow

pandas dataframe columns with list values

4 Answers 4

3 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related