Pivot pandas dataframe using group by

Question

I have a dataframe which is like this:

id  sub_id  count
0   94  1
1   94  9
1   315 7
2   94  4
2   265 1


data = {'id': [0,1,1,2,2], 
     'sub_id': [94,94,315,94,265], 
     'count': [1,9,7,4,1]
    }
df = pd.DataFrame(data)

And I want it in the following form:
id sub_id1 count_sub_id1 sub_id2 count_sub_id2
0  94      1             NaN     NaN
1  94      9             315     7
2  94      4             265     1

Note: Here, every id can have either can have a maximum of two rows, each with different sub_id and their counts.

I tried this df.pivot(index='id',columns='sub_id',values='count') but this is causing all rows in the second column to be expanded as different columns, whereas I only need two columns, with a custom name, ie. only those two rows which exist for each group of ids

Scott Boston · Accepted Answer · 2019-09-16 16:06:46Z

3

Try using:

df_out = (df.set_index(['id', df.groupby('id').cumcount()+1])
            .unstack().sort_index(level=1, axis=1))

df_out.columns = [f'{i}{j}' if i == "sub_id" else f'{i}_sub_id{j}' 
                          for i, j in df_out.columns]

print(df_out.reset_index())

Output:

   id  count_sub_id1  sub_id1  count_sub_id2  sub_id2
0   0            1.0     94.0            NaN      NaN
1   1            9.0     94.0            7.0    315.0
2   2            4.0     94.0            1.0    265.0

edited Sep 16, 2019 at 16:06

answered Sep 16, 2019 at 16:01

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Brian · Accepted Answer · 2019-09-16 16:10:19Z

1

output_df = pd.concat([df.groupby('id')['sub_id'].apply(list).apply(pd.Series),
                   df.groupby('id')['count'].apply(list).apply(pd.Series)], axis =1)

output_df.columns = ['sub_id1', 'sub_id2', 'count_sub_id1', 'count_sub_id2']

>>>output_df

        sub_id1 sub_id2 count_sub_id1   count_sub_id2
0       94.0    NaN     1.0             NaN
1       94.0    315.0   9.0             7.0
2       94.0    265.0   4.0            1.0

answered Sep 16, 2019 at 16:10

Brian

1,5951 gold badge11 silver badges19 bronze badges

Comments

Quang Hoang · Accepted Answer · 2019-09-16 16:12:24Z

1

Here's another way:

df_out = (df.groupby('id')
   .apply(lambda x: x.reset_index(drop=True).head(2))
   .drop('id', axis=1)
   .unstack()
)

Output:

   sub_id        count     
        0      1     0    1
id                         
0    94.0    NaN   1.0  NaN
1    94.0  315.0   9.0  7.0
2    94.0  265.0   4.0  1.0

To rename:

df_out.columns = [f'{i}{j+1} for i,j in df_out.columns]

answered Sep 16, 2019 at 16:12

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Collectives™ on Stack Overflow

Pivot pandas dataframe using group by

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related