1

I need to create a column Period_Subcategory based on other columns' values: a dictionary of {Period value: [list of Sub_Category values...]}

Input df:

Period   Category    Sub_Category
FY18Q1   Clothing    Shirt    
FY18Q2   Clothing    Trouser
FY18Q1   Clothing    Shirt
FY18Q3   Clothing    Pant 
FY18Q1   Accessories Watch
FY18Q2   Accessories Muff
FY18Q2   Accessories Watch
FY18Q3   Accessories Chains

Desired output df_output:

Category    Period_Subcategory
Clothing    {'FY18Q1':'Shirt','FY18Q2':'Trouser','FY18Q3':'Pant'}
Accessories {'FY18Q1':'Watch','FY18Q2':['muff','Watch'],'FY18Q3':'Chains'}

3 Answers 3

2

One liner

>>> df2 = df.groupby(by=["Category", "Period"]).agg(lambda x: list(set(x))).reset_index().groupby("Category").apply(lambda x: dict(zip(x["Period"], x["Sub_Category"])))
>>> df2
Category
Accessories    {'FY18Q1': ['Watch'], 'FY18Q2': ['Watch', 'Muf...
Clothing       {'FY18Q1': ['Shirt'], 'FY18Q2': ['Trouser'], '...
dtype: object
>>> df2.values
array([{'FY18Q1': ['Watch'], 'FY18Q2': ['Watch', 'Muff'], 'FY18Q3': ['Chains']},
       {'FY18Q1': ['Shirt'], 'FY18Q2': ['Trouser'], 'FY18Q3': ['Pant']}],
      dtype=object)
Sign up to request clarification or add additional context in comments.

2 Comments

You want groupby(..., sort=False) to keep the original order 'Clothing','Accessories'.
Thanks. Also the structure of values in dictionary column requested by OP was not consistent (either string or list).
2

Write a function that constructs the dictionary and apply it to your dataframe, grouped by category:

def make_dict(df):
    d = {}
    for period in sorted(set(df.Period)):
        d[period] = list(set(df.Sub_Category[df.Period == period]))
    return d

df_output = df.groupby('Category').apply(make_dict)

Comments

1

I can nearly get this with:

pd.DataFrame.from_records( [(a,dict(zip(g['Period'],g['Sub_Category'])) ) for (a,g) in df.groupby('Category', group_keys=False)], columns=['Category','Period_Subcategory'] ).set_index('Category')

                                                     Period_Subcategory
Category                                                               
Accessories  {'FY18Q1': 'Watch', 'FY18Q2': 'Watch', 'FY18Q3': 'Chains'}
Clothing     {'FY18Q1': 'Shirt', 'FY18Q2': 'Trouser', 'FY18Q3': 'Pant'}

...except that pd.DataFrame.from_records() doesn't create a list, so it mishandles the duplicate for 'Accessories','FY18Q2'.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.