0

Please find the below input and output. Corresponding to each store id and period id , 11 Items should be present , if any item is missing, add it and fill that row with 0 without using loop.

Any help is highly appreciated.

input

enter image description here

Expected Output

enter image description here

3 Answers 3

1

You can do this:

Sample df:

df = pd.DataFrame({'store_id':[1160962,1160962,1160962,1160962,1160962,1160962,1160962,1160962,1160962,1160962, 1160962],
                   'period_id':[1025,1025,1025,1025,1025,1025,1026,1026,1026,1026,1026],
                   'item_x':[1,4,5,6,7,8,1,2,5,6,7],
                  'z':[1,4,5,6,7,8,1,2,5,6,7]})

Solution:

num = range(1,12)
def f(x):
    return x.reindex(num, fill_value=0)\
                   .assign(store_id=x['store_id'].mode()[0], period_id = x['period_id'].mode()[0])

df.set_index('item_x').groupby(['store_id','period_id'], group_keys=False).apply(f).reset_index()
Sign up to request clarification or add additional context in comments.

1 Comment

This doesn't seem to work correctly, the item_x values for the first period are respected, but on the second period are not (are copied from the first.) It seems to me that using reindex on a groupby might be the source of the issue...
1

You can do:

from itertools import product

pdindex=product(df.groupby(["store_id", "period_id"]).groups, range(1,12))

pdindex=pd.MultiIndex.from_tuples(map(lambda x: (*x[0], x[1]), pdindex), names=["store_id", "period_id", "Item"])

df=df.set_index(["store_id", "period_id", "Item"])

res=pd.DataFrame(index=pdindex, columns=df.columns)

res.loc[df.index, df.columns]=df

res=res.fillna(0).reset_index()

Now this will work only assuming you don't have any Item outside of range [1,11].

Comments

0

This is a simplification of @GrzegorzSkibinski's correct answer.

This answer is not modifying the original DataFrame. It uses fewer variables to store intermediate data structures and employs a list comprehension to simplify an use of map.

I'm also using reindex() rather than creating a new DataFrame using the generated index and populating it with the original data.

import pandas as pd
import itertools

df.set_index(
    ["store_id", "period_id", "Item_x"]
).reindex(
    pd.MultiIndex.from_tuples([
        group + (item,)
        for group, item in itertools.product(
            df.groupby(["store_id", "period_id"]).groups, 
            range(1, 12),
        )],
        names=["store_id", "period_id", "Item_x"]
    ),
    fill_value=0,
).reset_index()

In testing, output matched what you listed as expected.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.