0

Just trying to set up a simple linear regression test based on the following example.

Here is my code:

# Normalize customer data
x_array = np.array(CustomerRFM['recency'])
normalized_X = preprocessing.normalize([x_array])
y_array = np.array(CustomerRFM['monetary_value'])
normalized_Y = preprocessing.normalize([y_array])

print('normalized_X: ' + str(np.count_nonzero(normalized_X)))
print('normalized_Y: ' + str(np.count_nonzero(normalized_Y)))

X_train, X_test = train_test_split(normalized_X, test_size=0.2)
Y_train, Y_test = train_test_split(normalized_Y, test_size=0.2)

print('X_train: ' + str(np.count_nonzero(X_train)))
print('Y_train: ' + str(np.count_nonzero(Y_train)))

regr = LinearRegression()
regr.fit(X_train, Y_train)

I have added the four print() lines as I am getting a strange issue. The console print of these four lines is:

normalized_X: 4304
normalized_Y: 4338
X_train: 0
Y_train: 0

For some reason when I am splitting the data between training and testing data I get no values?

I get the following error on the regr.fit() line:

ValueError: Found array with 0 sample(s) (shape=(0, 4339)) while a minimum of 1 is required.

This tells me there is something wrong with the X values but I don't know what

UPDATE: Change to print(array.shape)

If I change my code to use

print('normalized_X: ' + str(normalized_X.shape))
print('normalized_Y: ' + str(normalized_Y.shape))

and this:

print('X_train: ' + str(X_train.shape))
print('Y_train: ' + str(Y_train.shape))

I get:

normalized_X: (1, 4339)
normalized_Y: (1, 4339)

and this:

X_train: (0, 4339)
Y_train: (0, 4339)
11
  • 1
    Before counting for non zero values, did you just print (X_train) and print (Y_train) to see what's inside Commented Jan 1, 2019 at 18:45
  • @Bazingaa - looks like their both empty arrays. Commented Jan 1, 2019 at 18:52
  • more helpful than print(np.count_nonzero(array)) would be print(array.shape). count_nonzero will flatten dimensions and ignore zero values - two features that are counterproductive here. shape is where a lot of tricky exciting things happen Commented Jan 1, 2019 at 18:54
  • but I don't understand why as both normalized X and Y have data Commented Jan 1, 2019 at 18:54
  • 1
    X_train, X_test = train_test_split(np.transpose(normalized_X), test_size=0.2) Y_train, Y_test = train_test_split(np.transpose(normalized_Y,) test_size=0.2) Commented Jan 1, 2019 at 19:01

1 Answer 1

1

It looks like you're using preprocessing.normalize incorrectly. By wrapping [x_array] in square brackets, you're creating an array of shape (1, 4339).

According to the docs, preprocessing.normalize expects an array of shape [n_samples, n_features]. In your example, n_samples is 1 and n_features is 4339 which I don't think is what you want! You're then asking train_test_split to split a data set of one sample, so it understandably returns an empty array.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.