I have a dataframe with entries in this format:
user_id,item_list
0,3569 6530 4416 5494 6404 6289 10227 5285 3601 3509 5553 14879 5951 4802 15104 5338 3604 2345 9048 8627
1,16148 8470 7671 8984 9795 6811 3851 3611 7662 5034 5301 6948 5840 345 14652 10729 8429 7295 4949 16144
...
*Note that the user_id is not an index of the dataframe
I want to transform the dataframe into one that looks like this:
user_id,item_id
0,3569
0,6530
0,4416
0,5494
...
1,4949
1,16144
...
Right now I am trying this but it is wildly inefficient:
df = pd.read_csv("20recs.csv")
numberOfRows = 28107*20
df2 = pd.DataFrame(index=np.arange(0, numberOfRows),columns=('user', 'item'))
iter = 0
for index, row in df.iterrows():
user = row['user_id']
itemList = row['item_list']
items = itemList.split(' ')
for item in items:
df2.loc[iter] = [user]+[item]
iter = iter + 1
As you can see, I even tried pre-allocating the memory for the dataframe but it doesn't seem to help much.
So there must be a much better way to do this. Can anyone help me?