2

I have a dataframe which is like this:

id  sub_id  count
0   94  1
1   94  9
1   315 7
2   94  4
2   265 1


data = {'id': [0,1,1,2,2], 
     'sub_id': [94,94,315,94,265], 
     'count': [1,9,7,4,1]
    }
df = pd.DataFrame(data)

And I want it in the following form:
id sub_id1 count_sub_id1 sub_id2 count_sub_id2
0  94      1             NaN     NaN
1  94      9             315     7
2  94      4             265     1

Note: Here, every id can have either can have a maximum of two rows, each with different sub_id and their counts.

I tried this df.pivot(index='id',columns='sub_id',values='count') but this is causing all rows in the second column to be expanded as different columns, whereas I only need two columns, with a custom name, ie. only those two rows which exist for each group of ids

3 Answers 3

3

Try using:

df_out = (df.set_index(['id', df.groupby('id').cumcount()+1])
            .unstack().sort_index(level=1, axis=1))

df_out.columns = [f'{i}{j}' if i == "sub_id" else f'{i}_sub_id{j}' 
                          for i, j in df_out.columns]

print(df_out.reset_index())

Output:

   id  count_sub_id1  sub_id1  count_sub_id2  sub_id2
0   0            1.0     94.0            NaN      NaN
1   1            9.0     94.0            7.0    315.0
2   2            4.0     94.0            1.0    265.0
Sign up to request clarification or add additional context in comments.

Comments

1
output_df = pd.concat([df.groupby('id')['sub_id'].apply(list).apply(pd.Series),
                   df.groupby('id')['count'].apply(list).apply(pd.Series)], axis =1)

output_df.columns = ['sub_id1', 'sub_id2', 'count_sub_id1', 'count_sub_id2']

>>>output_df

        sub_id1 sub_id2 count_sub_id1   count_sub_id2
0       94.0    NaN     1.0             NaN
1       94.0    315.0   9.0             7.0
2       94.0    265.0   4.0            1.0

Comments

1

Here's another way:

df_out = (df.groupby('id')
   .apply(lambda x: x.reset_index(drop=True).head(2))
   .drop('id', axis=1)
   .unstack()
)

Output:

   sub_id        count     
        0      1     0    1
id                         
0    94.0    NaN   1.0  NaN
1    94.0  315.0   9.0  7.0
2    94.0  265.0   4.0  1.0

To rename:

df_out.columns = [f'{i}{j+1} for i,j in df_out.columns]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.