0

Summary
I am using Python 2.7. I have a data frame with all categorical variables i.e. data type is string. I would like to transform unique row values of one column into multiple columns. Additionally, the values of those resulting columns must have the corresponding values from another column. To describe in detail, I have provided a reproducible data frame and expected output for your reference.

Dataframe that needs transposing can be created as follows:

import pandas as pd
codes = ['codeA','codeB', 'codeC']
variables = ['textA','textA','textB']
dataset = list(zip(codes,variables))
df = pd.DataFrame(data = dataset, columns=['codes','variables'])
df['string'] = 'string1'

The data frame that needs transposing looks like this:

df
   codes variables   string
0  codeA     textA  string1
1  codeB     textA  string1
2  codeC     textB  string1

The expected final output should like this:

textA textB string
codeA       string1
codeB
      codeC string1

Note: The objective is transposition. I am not overly concerned whether the blank spaces are NULL values or zeroes.

2
  • I don't think I fully understand. Pandas has transpose .T, however I don't think this is what you want. What do you mean by: "I would like to transform unique row values of one column into multiple columns."? Your expected output also makes no sense to me. Commented Apr 5, 2020 at 5:07
  • @OceanScientist , I agree the title could be better worded. To explain the example given above, Firstly, the rows of the column variables must be transposed into multiple columns. Secondly, the corresponding values in column codes must be used to populate these new columns created. The input and output of this transformation is given in the example. I wish I could be more articulate about it. I have seen transpose(), pivot(), pivot_table() options but all the examples speak of some numeric column that gets aggregated. My problem is that all the columns are string. Commented Apr 5, 2020 at 5:45

1 Answer 1

0

Im not sure about the last column in your example as it seems inconsistent with the rest of the transformation. In any ways, I think converting the variable column using pandas get_dummies function is probably a good place to start.

import pandas as pd
codes = ['codeA','codeB', 'codeC']
variables = ['textA','textA','textB']
dataset = list(zip(codes,variables))
df = pd.DataFrame(data = dataset, columns=['codes','variables'])
df['string'] = 'string1'

df = pd.get_dummies(df, columns=['variables'])
df.variables_textA = df.codes.where(df.variables_textA.astype(bool),0)
df.variables_textB = df.codes.where(df.variables_textB.astype(bool),0)
columns = ['variables_textA', 'variables_textB','string']
df = df[columns]

Result

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @le_camerone. It works perfectly fine. I appreciate your help. I have accepted your response and have voted but Vote won't show as I have less than 15 reputation :(

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.