2

I have a dataframe sorted by amount giving me top 5 categories per Name like this:

| Name | Category | Amount |
|------|----------|--------|
| Abel | A        | 9.2    |
| Abel | B        | 3      |
| Abel | C        | 2.5    |
| Abel | E        | 2      |
| Abel | X        | 0      |
| Cain | W        | 93     |
| Cain | A        | 2      | 
|------|----------|--------|

This is what I want in the end:

| Name | Cat 1 | Cat 2 | Cat 3 | Cat 4 | Cat 5 |
|------|-------|-------|-------|-------|-------|
| Abel | A     | B     | C     | E     | X     |
| Cain | W     | A     | -     | -     |  -    |
|------|-------|-------|-------|-------|-------|

I tried df.pivot("Name","Category") but it's setting the values (e.g. A, B, ...) as the column names but I want the 5 columns to be predefined as "Cat 1" to "Cat 5" instead so I'm not sure what can I do to get the result now. Also, not all names have 5 rows. For e.g. Cain has only top 2, which mean Cat 3, Cat 4 and Cat5 columns should be null or "-". Any help? Thanks!

Updates:

Ok, so for e.g. if all my names have only 2 categories record, I want to still get 5 new columns for top 5 categories (i.e. Cat 1, Cat 2, Cat 3, Cat 4, Cat 5).

Now if I do

df["g"] = top5_jmi.groupby("Name").cumcount().add(1)

This will only give me 2 columns if I pivot it later. How can I get 5 columns? E.g.

| Name | Category | Amount |
|------|----------|--------|
| Abel | A        | 9.2    |
| Abel | B        | 3      |
| Cain | W        | 93     |
| Cain | A        | 2      |
|------|----------|--------|

should still give me this:

| Name | Cat 1 | Cat 2 | Cat 3 | Cat 4 | Cat 5 |
|------|-------|-------|-------|-------|-------|
| Abel | A     |  B    |   -   |   -   |   -   |
| Cain | W     |  A    |   -   |   -   |   -   |
|------|-------|-------|-------|-------|-------|

1 Answer 1

1

Use:

#create counter column used for later columns names
df['g'] = df.groupby('Name').cumcount().add(1)
#filter top3
df = df[df['g'] <= 5]
#reshape by pivot
df2 = (df.pivot('Name','g','Category')
         .add_prefix('Type ')
         .reset_index()
         .rename_axis(None, axis=1)
         .fillna('-'))
print (df2)
   Name Type 1 Type 2 Type 3 Type 4 Type 5
0  Abel      A      B      C      E      X
1  Cain      W      A      -      -      -

EDIT: Use DataFrame.reindex for add missing columns:

df['g'] = df.groupby('Name').cumcount().add(1)
#filter top3
df = df[df['g'] <= 5]
#reshape by pivot
df2 = (df.pivot('Name','g','Category')
         .reindex(range(1, 6), axis=1)
         .add_prefix('Type ')
         .reset_index()
         .rename_axis(None, axis=1)
         .fillna('-'))
print (df2)
   Name Type 1 Type 2 Type 3 Type 4 Type 5
0  Abel      A      B      -      -      -
1  Cain      W      A      -      -      -
Sign up to request clarification or add additional context in comments.

3 Comments

I just checked. If all the Names have only 1 type or less than 5, there will only be a few columns e.g. Type 1, Type 2 and no more. But I still want to have Type 3, Type 4, Type 5 with all values as "-". This happen because the cumcount() is based on number of rows in Name. Is there a way to fix that to 5 rows per Name and "-" if no rows?
@AhSheng - Not sure if understand, what is algo for omit first or second values and get 3.,4., 5. values only? Can you explain more?
I think in sample data values for Cain are in first and second column, what should be changed for e.g. 3. and 5. column filled by W and A ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.