0

I have a panas dataframe as follows:

 df 

   Prod  ProdDesc   tot    avg   qtr        val_qtr
   A      Cyl       110   8.7    202301     12
   A      Cyl       110   8.7    202302     56.9
   A      Cyl       110   8.7    202303      9
   A      Cyl       110   8.7    202304      0

So what I want is to stack/transpose the dataframe. I used pandas melt,

    df_tra = df.melt(id_vars=['Prod', 'ProdDesc'], var_name='Attrib', value_name='Value')
    df_tra.drop_duplicates()

So my output comes as :

df_tra


    Prod  ProdDesc  Attrib   Value
     A    Cyl       tot      110           
     A    Cyl       avg      8.7           
     A    Cyl       quarter  202301    
     A    Cyl       quarter  202302
     A    Cyl       quarter  202303
     A    Cyl       quarter  202304        
     A    Cyl       val_qtr  12    
     A    Cyl       val_qtr  56.9
     A    Cyl       val_qtr  9
     A    Cyl       val_qtr  0  

but the output what I want/desire is different. What I want is the following:

 df_actual_wanted 

    Prod  ProdDesc  Attrib   Value
    A     Cyl       tot      110           
    A     Cyl       avg      8.7           
    A     Cyl       202301   12 
    A     Cyl       202302   56.9
    A     Cyl       202303    9
    A     Cyl       202304    0     

How can I achieve that?

1 Answer 1

5

Select multiple columns with DataFrame.drop_duplicates and DataFrame.melt, and join with snoter subset with rename by concat, last if necessary sorting by both columns:

df1 = (df[['Prod','ProdDesc','tot','avg']]
               .drop_duplicates()
               .melt(id_vars=['Prod', 'ProdDesc'], var_name='Attrib', value_name='Value'))
df2 = (df[['Prod','ProdDesc','qtr','val_qtr']]
               .rename(columns={'qtr':'Attrib','val_qtr':'Value'}))

out = pd.concat([df1, df2]).sort_values(['Prod','ProdDesc'], ignore_index=True)
print (out)
  Prod ProdDesc  Attrib  Value
0    A      Cyl     tot  110.0
1    A      Cyl     avg    8.7
2    A      Cyl  202301   12.0
3    A      Cyl  202302   56.9
4    A      Cyl  202303    9.0
5    A      Cyl  202304    0.0

If default index and sorting need same like original change solution a bit:

print (df)
   Prod ProdDesc  tot   avg     qtr  val_qtr
0     A      Cyl  110  8.70  202301     12.0
1     A      Cyl  110  8.70  202302     56.9
2     A      Cyl  110  8.70  202303      9.0
3     A      Cyl  110  8.70  202304      0.0
4     B      Cyl  133  8.76  202301     12.0
5     B      Cyl  133  8.76  202302     56.9
6     B      Cyl  133  8.76  202303      9.0
7     B      Cyl  133  8.76  202304      0.0
8     A     Cyl1  117  8.37  202301     12.0
9     A     Cyl1  117  8.37  202302     56.9
10    A     Cyl1  117  8.37  202303      9.0
11    A     Cyl1  117  8.37  202304      0.0

df1 = (df[['Prod','ProdDesc','tot','avg']]
               .drop_duplicates()
               .melt(id_vars=['Prod', 'ProdDesc'], 
                     var_name='Attrib', 
                     value_name='Value',
                     ignore_index=False))
df2 = (df[['Prod','ProdDesc','qtr','val_qtr']]
               .rename(columns={'qtr':'Attrib','val_qtr':'Value'}))

out = pd.concat([df1, df2]).sort_index(kind='stable', ignore_index=True)

print (out)
   Prod ProdDesc  Attrib   Value
0     A      Cyl     tot  110.00
1     A      Cyl     avg    8.70
2     A      Cyl  202301   12.00
3     A      Cyl  202302   56.90
4     A      Cyl  202303    9.00
5     A      Cyl  202304    0.00
6     B      Cyl     tot  133.00
7     B      Cyl     avg    8.76
8     B      Cyl  202301   12.00
9     B      Cyl  202302   56.90
10    B      Cyl  202303    9.00
11    B      Cyl  202304    0.00
12    A     Cyl1     tot  117.00
13    A     Cyl1     avg    8.37
14    A     Cyl1  202301   12.00
15    A     Cyl1  202302   56.90
16    A     Cyl1  202303    9.00
17    A     Cyl1  202304    0.00

If small data or performance is not important:

def f(x):
    y = x[['tot','avg']].iloc[0].T.reset_index().set_axis(['Attrib', 'Value'], axis=1)
    return pd.concat([y, x[['Attrib','Value']]])

out = (df.rename(columns={'qtr':'Attrib','val_qtr':'Value'})
         .groupby(['Prod', 'ProdDesc'], sort=False)
         .apply(f)
         .droplevel(-1)
         .reset_index())
print (out)
   Prod ProdDesc  Attrib   Value
0     A      Cyl     tot  110.00
1     A      Cyl     avg    8.70
2     A      Cyl  202301   12.00
3     A      Cyl  202302   56.90
4     A      Cyl  202303    9.00
5     A      Cyl  202304    0.00
6     B      Cyl     tot  133.00
7     B      Cyl     avg    8.76
8     B      Cyl  202301   12.00
9     B      Cyl  202302   56.90
10    B      Cyl  202303    9.00
11    B      Cyl  202304    0.00
12    A     Cyl1     tot  117.00
13    A     Cyl1     avg    8.37
14    A     Cyl1  202301   12.00
15    A     Cyl1  202302   56.90
16    A     Cyl1  202303    9.00
17    A     Cyl1  202304    0.00
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.