I have a nested list with string values that I used to create a list with binary values. I used the transformed list as predictors in my model.
The list with string values -
D = [["An", "Cn"], ["Bs", "Gt"], ["Cd", "El"], ["Cd", "Cn", "En"]]
With
D_tran = pd.Series([';'.join(i) for i in D]).str.get_dummies(';')
I obtained D_tran
An Bs Cd Cn El En Gt
0 1 0 0 1 0 0 0
1 0 1 0 0 0 0 1
2 0 0 1 0 1 0 0
3 0 0 1 1 0 1 0
With
D_list = D_tran.values.tolist()
I obtained D_list:
[[1, 0, 0, 1, 0, 0, 0], [0, 1, 0, 0, 0, 0, 1], [0, 0, 1, 0, 1, 0, 0], [0, 0, 1, 1, 0, 1, 0]]
I use this to create a linear regression model. To test my model, however, I need to transform the string values in my test data to be binary. The test data looks like -
R = [["Bs"], ["Cd", "El"], ["An"]]
My question is how to map R into the frame of D_list in order to obtain
R = [[0, 1, 0, 0, 0, 0, 0], [0, 0, 1, 0, 1, 0, 0], [1, 0, 0, 0, 0, 0, 0]]
Please note that, in the test data, only part of the predictors appear.
Thank you very much for your assistance.