pandas dataframe manipulation without using loop

Question

Please find the below input and output. Corresponding to each store id and period id , 11 Items should be present , if any item is missing, add it and fill that row with 0 without using loop.

Any help is highly appreciated.

input

Expected Output

Pygirl · Accepted Answer · 2020-02-20 19:19:22Z

1

You can do this:

Sample df:

df = pd.DataFrame({'store_id':[1160962,1160962,1160962,1160962,1160962,1160962,1160962,1160962,1160962,1160962, 1160962],
                   'period_id':[1025,1025,1025,1025,1025,1025,1026,1026,1026,1026,1026],
                   'item_x':[1,4,5,6,7,8,1,2,5,6,7],
                  'z':[1,4,5,6,7,8,1,2,5,6,7]})

Solution:

num = range(1,12)
def f(x):
    return x.reindex(num, fill_value=0)\
                   .assign(store_id=x['store_id'].mode()[0], period_id = x['period_id'].mode()[0])

df.set_index('item_x').groupby(['store_id','period_id'], group_keys=False).apply(f).reset_index()

answered Feb 20, 2020 at 19:19

Pygirl

13.4k6 gold badges36 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

filbranden Over a year ago

This doesn't seem to work correctly, the item_x values for the first period are respected, but on the second period are not (are copied from the first.) It seems to me that using reindex on a groupby might be the source of the issue...

Georgina Skibinski · Accepted Answer · 2020-02-20 19:20:38Z

1

You can do:

from itertools import product

pdindex=product(df.groupby(["store_id", "period_id"]).groups, range(1,12))

pdindex=pd.MultiIndex.from_tuples(map(lambda x: (*x[0], x[1]), pdindex), names=["store_id", "period_id", "Item"])

df=df.set_index(["store_id", "period_id", "Item"])

res=pd.DataFrame(index=pdindex, columns=df.columns)

res.loc[df.index, df.columns]=df

res=res.fillna(0).reset_index()

Now this will work only assuming you don't have any Item outside of range [1,11].

edited Feb 20, 2020 at 19:20

answered Feb 20, 2020 at 19:09

Georgina Skibinski

13.5k2 gold badges16 silver badges44 bronze badges

Comments

filbranden · Accepted Answer · 2020-02-21 07:23:38Z

This is a simplification of @GrzegorzSkibinski's correct answer.

This answer is not modifying the original DataFrame. It uses fewer variables to store intermediate data structures and employs a list comprehension to simplify an use of map.

I'm also using reindex() rather than creating a new DataFrame using the generated index and populating it with the original data.

import pandas as pd
import itertools

df.set_index(
    ["store_id", "period_id", "Item_x"]
).reindex(
    pd.MultiIndex.from_tuples([
        group + (item,)
        for group, item in itertools.product(
            df.groupby(["store_id", "period_id"]).groups, 
            range(1, 12),
        )],
        names=["store_id", "period_id", "Item_x"]
    ),
    fill_value=0,
).reset_index()

In testing, output matched what you listed as expected.

Collectives™ on Stack Overflow

pandas dataframe manipulation without using loop

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related