Logistic Regression => ValueError: setting an array element with a sequence

Question

I am new to machine learning, I am trying to apply logistic regression on my sample data set I have a single feature that contains a list of numbers and want to predict class.

the following is my code

from sklearn.linear_model import LogisticRegression
a = [[1,2,3], [1,2,3,4,5,6], [4,5,6,7], [0,0,0,7,1,2,3]]
b = [0,1,0, 0]
p = [[9,0,2,4]]

clfModel1 = LogisticRegression(class_weight='balanced')
clfModel1.fit(a,b)
clfModel1.predict(p)

I am getting the following error

Traceback (most recent call last):
  File "F:\python_3.4\NLP\t.py", line 7, in <module>
    clfModel1.fit(a,b)
  File "C:\Python34\lib\site-packages\sklearn\linear_model\logistic.py", line 1173, in fit
    order="C")
  File "C:\Python34\lib\site-packages\sklearn\utils\validation.py", line 521, in check_X_y
    ensure_min_features, warn_on_dtype, estimator)
  File "C:\Python34\lib\site-packages\sklearn\utils\validation.py", line 382, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.
>>>

Is there some way to change the data such that I can the apply the classifier and predict the results

Your a is not a valid input - it is a staggered "matrix". In logistic regression, each feature needs to be a number, not a list. How is this suppose to work with logistic regression? — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Aug 3, 2017 at 19:05
I thought the same thing, Is there a way around, please help — M2skills
– M2skills, Commented Aug 3, 2017 at 19:09
Fundamentally, the fix you want to make the LogisticRegression.fit method work is to make all the sublists in a the same size, however, wihtout understanding your data there is no way to say how you could do that in a valid and useful way. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Aug 3, 2017 at 19:13

lejlot · Accepted Answer · 2017-08-03 19:09:50Z

5

Logistic regression is an estimator for functions of form:

R^d -> [0,1]

But your data clearly is not a subset of R^d, as each sample in a has different length (number of dimensions), thus it cannot be applied.

Another problem is that p should be a list of samples too, not a single sample (and it has to have d dimensions too, of course).

There is no "way around this" it is simply a wrong idea. What is a typical solution to working with "odd" data:

you predefine your own, custom mapping (feature extraction step) that given your varied-length point outputs a fixed length representation (so outputs d numbers). There is no general way of doing that - everything depends on data.
there are models that can deal with varied length inputs, such as LSTMs, but it is a huge jump from logistic regression to recurrent neural nets.
use methods which are similarity-based (like kNN) and simply define your own measure of what it means that two "lists of numbers" are similar.

There is no other way - either rethink representation of your data, or change approach.

answered Aug 3, 2017 at 19:09

lejlot

67k9 gold badges138 silver badges168 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

M2skills Over a year ago

I get your point, I have changed p as you suggested but still there is no way to input a staggered data as input to classifier. Thanks

lejlot Over a year ago

Indeed, as described in the first part of the answer, LR cannot be applied to such data. Issue with p was a minor thing.

Collectives™ on Stack Overflow

Logistic Regression => ValueError: setting an array element with a sequence

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related