How to transform a dataframe with a column whose values are lists to a dataframe where each element of each list in that column becomes a new row

Question

I have a dataframe with entries in this format:

user_id,item_list
0,3569 6530 4416 5494 6404 6289 10227 5285 3601 3509 5553 14879 5951 4802 15104 5338 3604 2345 9048 8627
1,16148 8470 7671 8984 9795 6811 3851 3611 7662 5034 5301 6948 5840 345 14652 10729 8429 7295 4949 16144
...

*Note that the user_id is not an index of the dataframe

I want to transform the dataframe into one that looks like this:

user_id,item_id
0,3569
0,6530
0,4416 
0,5494 
...
1,4949
1,16144
...

Right now I am trying this but it is wildly inefficient:

df = pd.read_csv("20recs.csv")
numberOfRows = 28107*20
df2 = pd.DataFrame(index=np.arange(0, numberOfRows),columns=('user', 'item'))
iter = 0
for index, row in df.iterrows():
    user = row['user_id']
    itemList = row['item_list']
    items = itemList.split(' ')
    for item in items:
        df2.loc[iter] = [user]+[item]
        iter = iter + 1

As you can see, I even tried pre-allocating the memory for the dataframe but it doesn't seem to help much.

So there must be a much better way to do this. Can anyone help me?

mcsoini · Accepted Answer · 2019-12-01 18:16:35Z

1

Use split to transform the lists to actual lists, then explode to ... well, explode the DataFrame. Requires pandas >= 0.25.0

>>> df = pd.DataFrame({'user_id': [0,1], 'item_list': ['1 2 3', '4 5 6']})
>>> df

   user_id item_list
0        0     1 2 3
1        1     4 5 6

>>> (df.assign(item_id=df.item_list.apply(lambda x: x.split(' ')))
       .explode('item_id')[['user_id', 'item_id']])

   user_id   item_id
0        0         1
0        0         2
0        0         3
1        1         4
1        1         5
1        1         6

edited Dec 1, 2019 at 18:16

answered Dec 1, 2019 at 18:10

mcsoini

6,7922 gold badges21 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

SchwarzeHuhn · Accepted Answer · 2019-12-01 18:18:13Z

1

First your item_id column should be a list

df['item_id_list'] = df['item_id'].str.split(',').values.tolist()
df['item_id_list_int'] = [[int(i) for i in x] for x in df['item_id_list']]

Then you explode it

df_exp = df.explode('item_id_list_int')

edited Dec 1, 2019 at 18:18

answered Dec 1, 2019 at 18:12

SchwarzeHuhn

6485 silver badges17 bronze badges

Comments

oppressionslayer · Accepted Answer · 2019-12-01 18:33:25Z

Try this:

df.set_index('user_id').item_list.apply(lambda x: x.split(' ')).explode().reset_index().set_index('user_id')

output

        item_list
user_id          
0            3569
0            6530
0            4416
0            5494
0            6404
0            6289
0           10227
0            5285
0            3601
0            3509
0            5553
0           14879
0            5951
0            4802
0           15104
0            5338
0            3604
0            2345
0            9048
0            8627
1           16148
1            8470
1            7671
1            8984
1            9795
1            6811
1            3851
1            3611
1            7662
1            5034
1            5301
1            6948
1            5840
1             345
1           14652
1           10729
1            8429
1            7295
1            4949
1           16144

or if you want an index:

df.set_index('user_id').item_list.apply(lambda x: x.split(' ')).explode().reset_index()

Collectives™ on Stack Overflow

How to transform a dataframe with a column whose values are lists to a dataframe where each element of each list in that column becomes a new row

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related