0

I have a nested list with string values that I used to create a list with binary values. I used the transformed list as predictors in my model.

The list with string values -

D = [["An", "Cn"], ["Bs", "Gt"], ["Cd", "El"], ["Cd", "Cn", "En"]]

With

D_tran = pd.Series([';'.join(i) for i in D]).str.get_dummies(';')

I obtained D_tran

   An  Bs  Cd  Cn  El  En  Gt
0   1   0   0   1   0   0   0
1   0   1   0   0   0   0   1
2   0   0   1   0   1   0   0
3   0   0   1   1   0   1   0

With

D_list = D_tran.values.tolist()

I obtained D_list:

[[1, 0, 0, 1, 0, 0, 0], [0, 1, 0, 0, 0, 0, 1], [0, 0, 1, 0, 1, 0, 0], [0, 0, 1, 1, 0, 1, 0]]

I use this to create a linear regression model. To test my model, however, I need to transform the string values in my test data to be binary. The test data looks like -

R = [["Bs"], ["Cd", "El"], ["An"]]

My question is how to map R into the frame of D_list in order to obtain

R = [[0, 1, 0, 0, 0, 0, 0], [0, 0, 1, 0, 1, 0, 0], [1, 0, 0, 0, 0, 0, 0]] 

Please note that, in the test data, only part of the predictors appear.

Thank you very much for your assistance.

0

1 Answer 1

1

You can essentially do the same procedure as before with one minor modification: after creating the dummies dataframe, use reindex with the columns of D_tran:

R_tran = pd.Series([';'.join(i) for i in R]).str.get_dummies(';')
R_tran = R_tran.reindex(columns=D_tran.columns, fill_value=0)
R_list = R_tran.values.tolist()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.