2

I have tried to unstack a dataframe with duplicates jumping through different hoops. So far without result. I would be grateful for any help:

I have a dataframe that have a 'long' format:

| id | variable  | value |
|----|-----------|-------|
| 1  | outcome_1 | NaN   |
| 2  | outcome_1 | 18:33 |
| 2  | outcome_1 | 20:39 |
| 2  | outcome_3 | 01:40 |
| 3  | outcome_2 | 03:59 |
| 3  | outcome_4 | 07:46 |
| 3  | outcome_3 | 10:53 |

And would like to convert it to a 'wide' format, but without aggregation and preserving all values, so the result would look like this:

| id_nmbr | outcome_1_0 | outcome_1_1 | outcome_2_0 | outcome_3_0 | outcome_4_0 |
|---------|-------------|-------------|-------------|-------------|-------------|
| 1       | NaN         | NaN         | NaN         | NaN         | NaN         |
| 2       | 18:33       | 20:39       | NaN         | 01:40       | NaN         |
| 3       | NaN         | NaN         | 03:59       | 07:46       | 10:53       |

So basically, preserve each value, and create a new column for each duplicate.

I have tried pivot or unstack, as well as pivot_table, but I think I need to string some functions together to achieve it. Any ideas?

2 Answers 2

2

Use GroupBy.cumcount for counter, then reshape by Series.unstack with sorting MultiIndex and flatten in map:

g = df.groupby(['id','variable']).cumcount()

df = df.set_index(['id','variable', g])['value'].unstack([1,2]).sort_index(axis=1)
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
df = df.reset_index()
print (df)
   id outcome_1_0 outcome_1_1 outcome_2_0 outcome_3_0 outcome_4_0
0   1         NaN         NaN         NaN         NaN         NaN
1   2       18:33       20:39         NaN       01:40         NaN
2   3         NaN         NaN       03:59       10:53       07:46
Sign up to request clarification or add additional context in comments.

Comments

1

pivot_wider function from pyjanitor can help to abstract the reshaping process:

# pip install pyjanitor
import pandas as pd
import janitor

 # the cumcount helps to get a unique index
(df.assign(counter = df.groupby(group).cumcount())
   .pivot_wider(index='id', 
                names_from=['variable', 'counter'], 
                values_from='value')
) 
   id outcome_1_0 outcome_1_1 outcome_3_0 outcome_2_0 outcome_4_0
0   1         NaN         NaN         NaN         NaN         NaN
1   2       18:33       20:39       01:40         NaN         NaN
2   3         NaN         NaN       10:53       03:59       07:46


Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.