Pandas - unstack with duplicates

Question

I have tried to unstack a dataframe with duplicates jumping through different hoops. So far without result. I would be grateful for any help:

I have a dataframe that have a 'long' format:

| id | variable  | value |
|----|-----------|-------|
| 1  | outcome_1 | NaN   |
| 2  | outcome_1 | 18:33 |
| 2  | outcome_1 | 20:39 |
| 2  | outcome_3 | 01:40 |
| 3  | outcome_2 | 03:59 |
| 3  | outcome_4 | 07:46 |
| 3  | outcome_3 | 10:53 |

And would like to convert it to a 'wide' format, but without aggregation and preserving all values, so the result would look like this:

| id_nmbr | outcome_1_0 | outcome_1_1 | outcome_2_0 | outcome_3_0 | outcome_4_0 |
|---------|-------------|-------------|-------------|-------------|-------------|
| 1       | NaN         | NaN         | NaN         | NaN         | NaN         |
| 2       | 18:33       | 20:39       | NaN         | 01:40       | NaN         |
| 3       | NaN         | NaN         | 03:59       | 07:46       | 10:53       |

So basically, preserve each value, and create a new column for each duplicate.

I have tried pivot or unstack, as well as pivot_table, but I think I need to string some functions together to achieve it. Any ideas?

jezrael · Accepted Answer · 2021-10-18 11:42:43Z

2

Use GroupBy.cumcount for counter, then reshape by Series.unstack with sorting MultiIndex and flatten in map:

g = df.groupby(['id','variable']).cumcount()

df = df.set_index(['id','variable', g])['value'].unstack([1,2]).sort_index(axis=1)
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
df = df.reset_index()
print (df)
   id outcome_1_0 outcome_1_1 outcome_2_0 outcome_3_0 outcome_4_0
0   1         NaN         NaN         NaN         NaN         NaN
1   2       18:33       20:39         NaN       01:40         NaN
2   3         NaN         NaN       03:59       10:53       07:46

edited Oct 18, 2021 at 11:42

answered Oct 18, 2021 at 11:37

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sammywemmy · Accepted Answer · 2021-10-18 20:23:09Z

1

pivot_wider function from pyjanitor can help to abstract the reshaping process:

# pip install pyjanitor
import pandas as pd
import janitor

 # the cumcount helps to get a unique index
(df.assign(counter = df.groupby(group).cumcount())
   .pivot_wider(index='id', 
                names_from=['variable', 'counter'], 
                values_from='value')
) 
   id outcome_1_0 outcome_1_1 outcome_3_0 outcome_2_0 outcome_4_0
0   1         NaN         NaN         NaN         NaN         NaN
1   2       18:33       20:39       01:40         NaN         NaN
2   3         NaN         NaN       10:53       03:59       07:46

answered Oct 18, 2021 at 20:23

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Collectives™ on Stack Overflow

Pandas - unstack with duplicates

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related