How to reorder array in Python with / without Pandas?

Question

In order to search correlations between products and categories and next visualizations (heatmaps) I need to reorder array using Python with/without Pandas or other libraries from this:

Book Name, Category 1, Category 2, Category 3, Django 101 Python Web-Dev Beginner ROR Guide Rails Web-Dev Intermediate Laravel PHP Web-Dev Intermediate

into that:

Book Name, Python, Web-Dev, Beginner, Rails, PHP, Intermediate Django 101 True True True False False, False ROR Guide False True False False False, True Laravel False True False False True, True

Is there any way to do that? Data stored into .csv file and read by pandas.read_csv ()

Maybe add some information on what kind of objects are in the array? Is this an array of arrays? — Daniel Slater
– Daniel Slater, Commented Jun 19, 2015 at 13:23

Alexander · Accepted Answer · 2015-06-19 14:18:34Z

2

This can be done using the get_dummies function in Pandas.

df = pd.DataFrame({'Book Name': ['Django 101', 'ROR Guide', 'Laravel'], 'Category 1': ['Python', 'Rails', 'PHP'], 'Category 2': ['Web-Dev']*3, 'Category 3': ['Beginner', 'Intermediate', 'Intermediate']})

dummies = pd.concat([pd.get_dummies(df[c]) for c in df.columns[1:]], axis=1)
df_new = pd.concat([df['Book Name'], dummies], axis=1)

>>> df_new
    Book Name  PHP  Python  Rails  Web-Dev  Beginner  Intermediate
0  Django 101    0       1      0        1         1             0
1   ROR Guide    0       0      1        1         0             1
2     Laravel    1       0      0        1         0             1

Or you can reset the index of the DataFrame to the Book's name:

df.set_index('Book Name', inplace=True)
df_new = pd.concat([pd.get_dummies(df[c]) for c in df], axis=1)
>>> df_new
            PHP  Python  Rails  Web-Dev  Beginner  Intermediate
Book Name                                                      
Django 101    0       1      0        1         1             0
ROR Guide     0       0      1        1         0             1
Laravel       1       0      0        1         0             1

edited Jun 19, 2015 at 14:18

answered Jun 19, 2015 at 13:54

Alexander

111k32 gold badges212 silver badges208 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sergei Over a year ago

Unfortunately I have data like that: ` Book Name, Category 1, Category 2, Category 3, Django 101 Python Web-Dev Beginner ROR Guide Rails Intermediate Web-Dev Laravel Beginner Web-Dev PHP ` so it produces column duplicates

Sergei Over a year ago

Does not work exactly right since categories can be mixed like that so it will produce more duplications

df = pd.DataFrame({'Book Name': ['Django 101', 'ROR Guide', 'Laravel'], 'Category 1': ['Python', 'Intermediate', 'PHP'], 'Category 2': ['Web-Dev', 'Web-Dev', 'Intermediate'], 'Category 3': ['Beginner', 'Rails', 'Web-Dev']})

Is there any way to avoid columns duplications?

Alexander Over a year ago

@sergei It is up to you to define the categorization. To ensure uniqueness across categories, you can prepend each name in the column with an identifier, e.g. cat1_beginner will be different than cat2_beginner.

Collectives™ on Stack Overflow

How to reorder array in Python with / without Pandas?

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related