Transpose row values into existing predefined columns in pandas dataframe

Question

I have a dataframe sorted by amount giving me top 5 categories per Name like this:

| Name | Category | Amount |
|------|----------|--------|
| Abel | A        | 9.2    |
| Abel | B        | 3      |
| Abel | C        | 2.5    |
| Abel | E        | 2      |
| Abel | X        | 0      |
| Cain | W        | 93     |
| Cain | A        | 2      | 
|------|----------|--------|

This is what I want in the end:

| Name | Cat 1 | Cat 2 | Cat 3 | Cat 4 | Cat 5 |
|------|-------|-------|-------|-------|-------|
| Abel | A     | B     | C     | E     | X     |
| Cain | W     | A     | -     | -     |  -    |
|------|-------|-------|-------|-------|-------|

I tried df.pivot("Name","Category") but it's setting the values (e.g. A, B, ...) as the column names but I want the 5 columns to be predefined as "Cat 1" to "Cat 5" instead so I'm not sure what can I do to get the result now. Also, not all names have 5 rows. For e.g. Cain has only top 2, which mean Cat 3, Cat 4 and Cat5 columns should be null or "-". Any help? Thanks!

Updates:

Ok, so for e.g. if all my names have only 2 categories record, I want to still get 5 new columns for top 5 categories (i.e. Cat 1, Cat 2, Cat 3, Cat 4, Cat 5).

Now if I do

df["g"] = top5_jmi.groupby("Name").cumcount().add(1)

This will only give me 2 columns if I pivot it later. How can I get 5 columns? E.g.

| Name | Category | Amount |
|------|----------|--------|
| Abel | A        | 9.2    |
| Abel | B        | 3      |
| Cain | W        | 93     |
| Cain | A        | 2      |
|------|----------|--------|

should still give me this:

| Name | Cat 1 | Cat 2 | Cat 3 | Cat 4 | Cat 5 |
|------|-------|-------|-------|-------|-------|
| Abel | A     |  B    |   -   |   -   |   -   |
| Cain | W     |  A    |   -   |   -   |   -   |
|------|-------|-------|-------|-------|-------|

jezrael · Accepted Answer · 2019-06-25 09:27:35Z

1

Use:

#create counter column used for later columns names
df['g'] = df.groupby('Name').cumcount().add(1)
#filter top3
df = df[df['g'] <= 5]
#reshape by pivot
df2 = (df.pivot('Name','g','Category')
         .add_prefix('Type ')
         .reset_index()
         .rename_axis(None, axis=1)
         .fillna('-'))
print (df2)
   Name Type 1 Type 2 Type 3 Type 4 Type 5
0  Abel      A      B      C      E      X
1  Cain      W      A      -      -      -

EDIT: Use DataFrame.reindex for add missing columns:

df['g'] = df.groupby('Name').cumcount().add(1)
#filter top3
df = df[df['g'] <= 5]
#reshape by pivot
df2 = (df.pivot('Name','g','Category')
         .reindex(range(1, 6), axis=1)
         .add_prefix('Type ')
         .reset_index()
         .rename_axis(None, axis=1)
         .fillna('-'))
print (df2)
   Name Type 1 Type 2 Type 3 Type 4 Type 5
0  Abel      A      B      -      -      -
1  Cain      W      A      -      -      -

edited Jun 25, 2019 at 9:27

answered Jun 20, 2019 at 10:09

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

wayneloo Over a year ago

I just checked. If all the Names have only 1 type or less than 5, there will only be a few columns e.g. Type 1, Type 2 and no more. But I still want to have Type 3, Type 4, Type 5 with all values as "-". This happen because the cumcount() is based on number of rows in Name. Is there a way to fix that to 5 rows per Name and "-" if no rows?

jezrael Over a year ago

@AhSheng - Not sure if understand, what is algo for omit first or second values and get 3.,4., 5. values only? Can you explain more?

jezrael Over a year ago

I think in sample data values for Cain are in first and second column, what should be changed for e.g. 3. and 5. column filled by W and A ?

Collectives™ on Stack Overflow

Transpose row values into existing predefined columns in pandas dataframe

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related