0

I am new to machine learning and am trying to set up a logistic regression for prediction purposes in Python using scikit-learn. I already set one up with a small, mock dataset, but when expanding this code to work for larger datasets, I run into an issue regarding a ValueError. Here is my code:

inputData = np.genfromtxt(file, skip_header=1, unpack=True)
print "X array shape: ",inputData.shape 
inputAnswers = np.genfromtxt(file2, skip_header=1, unpack=True)
print "Y array shape: ",inputAnswers.shape

logreg = LogisticRegression(penalty='l2',C=2.0)
logreg.fit(inputData, inputAnswers)

The inputData 2D array (matrix) has 149 rows and 231 columns. I'm trying to fit it to the inputAnswers array, which has 149 rows, correctly corresponding to the 149 rows of the inputData array. However, here is the output I receive:

X array shape:  (231, 149)
Y array shape:  (149,)
Traceback (most recent call last):
File "LogRegTry_rawData.py", line 26, in <module>
logreg.fit(inputData, inputAnswers)
File "[path]", line 676, in fit
(X.shape[0], y.shape[0]))
ValueError: X and y have incompatible shapes.
X has 231 samples, but y has 149.

I understand what the error means, but I'm not sure of both why it is showing up in this situation and how to fix it. Any help is greatly appreciated. Thank you!

1 Answer 1

1

In shape, the first element is the number of rows, and the second - the number of columns. So you have 231 entries, and only 149 labels. Try transposing your data: inputData.T

Sign up to request clarification or add additional context in comments.

2 Comments

thank you! I used the np.transpose() function, and this worked. I wonder why np.genfromtxt reads it "inverted," however...
unpack=True is transposing the data

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.