How to transform python data frame such that unique row values are transposed to columns and values of another column become their rows

Question

Summary
I am using Python 2.7. I have a data frame with all categorical variables i.e. data type is string. I would like to transform unique row values of one column into multiple columns. Additionally, the values of those resulting columns must have the corresponding values from another column. To describe in detail, I have provided a reproducible data frame and expected output for your reference.

Dataframe that needs transposing can be created as follows:

import pandas as pd
codes = ['codeA','codeB', 'codeC']
variables = ['textA','textA','textB']
dataset = list(zip(codes,variables))
df = pd.DataFrame(data = dataset, columns=['codes','variables'])
df['string'] = 'string1'

The data frame that needs transposing looks like this:

df
   codes variables   string
0  codeA     textA  string1
1  codeB     textA  string1
2  codeC     textB  string1

The expected final output should like this:

textA textB string
codeA       string1
codeB
      codeC string1

Note: The objective is transposition. I am not overly concerned whether the blank spaces are NULL values or zeroes.

I don't think I fully understand. Pandas has transpose .T, however I don't think this is what you want. What do you mean by: "I would like to transform unique row values of one column into multiple columns."? Your expected output also makes no sense to me. — Ocean Scientist
– Ocean Scientist, Commented Apr 5, 2020 at 5:07
@OceanScientist , I agree the title could be better worded. To explain the example given above, Firstly, the rows of the column variables must be transposed into multiple columns. Secondly, the corresponding values in column codes must be used to populate these new columns created. The input and output of this transformation is given in the example. I wish I could be more articulate about it. I have seen transpose(), pivot(), pivot_table() options but all the examples speak of some numeric column that gets aggregated. My problem is that all the columns are string. — SidharthMacherla
– SidharthMacherla, Commented Apr 5, 2020 at 5:45

le_camerone · Accepted Answer · 2020-04-05 10:08:37Z

0

Im not sure about the last column in your example as it seems inconsistent with the rest of the transformation. In any ways, I think converting the variable column using pandas get_dummies function is probably a good place to start.

import pandas as pd
codes = ['codeA','codeB', 'codeC']
variables = ['textA','textA','textB']
dataset = list(zip(codes,variables))
df = pd.DataFrame(data = dataset, columns=['codes','variables'])
df['string'] = 'string1'

df = pd.get_dummies(df, columns=['variables'])
df.variables_textA = df.codes.where(df.variables_textA.astype(bool),0)
df.variables_textB = df.codes.where(df.variables_textB.astype(bool),0)
columns = ['variables_textA', 'variables_textB','string']
df = df[columns]

answered Apr 5, 2020 at 10:08

le_camerone

6285 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

SidharthMacherla Over a year ago

Thanks @le_camerone. It works perfectly fine. I appreciate your help. I have accepted your response and have voted but Vote won't show as I have less than 15 reputation :(

Collectives™ on Stack Overflow

How to transform python data frame such that unique row values are transposed to columns and values of another column become their rows

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related