0

I am building a CNN and have been getting this error when trying to perform:

from tensorflow.keras import utils
trainY=utils.to_categorical(trainY)

ValueError: setting an array element with a sequence.

My trainY are actually labels, and it looks like this:

labels
array([list(['noise']), list(['noise']), list(['noise', 'point_source']),
       list(['noise']), list(['noise', 'point_source']),
       list(['noise', 'point_source']), list(['noise', 'point_source']),
       list(['noise', 'point_source']), list(['noise']), list(['noise']),
       list(['noise', 'point_source']), list(['noise']),
       list(['noise', 'point_source']), list(['noise', 'point_source']),
       list(['noise']), list(['noise', 'point_source']),
       list(['noise', 'point_source']), list(['noise']), list(['noise']),

Any suggestions how to fix this? Many thanks!

1 Answer 1

1

You can do that with sklearn.preprocessing.MultiLabelBinarizer

import numpy as np

labels = np.array([list(['noise']), list(['noise']), list(['noise', 'point_source']),
       list(['noise']), list(['noise', 'point_source']),
       list(['noise', 'point_source']), list(['noise', 'point_source']),
       list(['noise', 'point_source']), list(['noise']), list(['noise']),
       list(['noise', 'point_source']), list(['noise']),
       list(['noise', 'point_source']), list(['noise', 'point_source']),
       list(['noise']), list(['noise', 'point_source']),
       list(['noise', 'point_source']), list(['noise']), list(['noise'])])

That was what you had. Now you need to do this:

from sklearn.preprocessing import MultiLabelBinarizer

as_list = [list(i) for i in labels]

mlb = MultiLabelBinarizer()
ohe = mlb.fit_transform(as_list) # you might need to add .astype(float)

This is what you'll end up with:

array([[1, 0],
       [1, 0],
       [1, 1],
       [1, 0],
       [1, 1],
       [1, 1],
       [1, 1],
       [1, 1], ...
Sign up to request clarification or add additional context in comments.

9 Comments

Hey Nicolas, thank you for your help! Now, when I try to do what you did I get a new error when setting my trainX and trainY split: (trainX, testX, trainY, testY) = train_test_split(signals, labels, test_size=0.2, random_state=42) Found input variables with inconsistent numbers of samples: [100, 154] Does this mean they are not equal length now? Any tips on how to fix that? I am trying to d make a multi-label classifier, really appreciate your help! @NicolasGervais
Maybe remove the parentheses? Or do the train test split before using to_categorical? I'm unable to help you further. What's the error
hm even when I do to_categorical after the train/test split I now have a trainY of shape (124, 2) and before was (80, )' , just as the trainX. I also removed the parenthesis. Hm strange, the error is: Input arrays should have the same number of samples as target arrays. Found 80 input samples and 124 target samples.` Thanks again for the help!
So this is supposed to be the target for one observation or two? list(['noise', 'point_source'])
See my updated answer. It doesn't seem possible to use targets with different length with to_categorical(), and I am not able to use keras at the moment. I hope this solution is ok for you
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.