Just trying to set up a simple linear regression test based on the following example.
Here is my code:
# Normalize customer data
x_array = np.array(CustomerRFM['recency'])
normalized_X = preprocessing.normalize([x_array])
y_array = np.array(CustomerRFM['monetary_value'])
normalized_Y = preprocessing.normalize([y_array])
print('normalized_X: ' + str(np.count_nonzero(normalized_X)))
print('normalized_Y: ' + str(np.count_nonzero(normalized_Y)))
X_train, X_test = train_test_split(normalized_X, test_size=0.2)
Y_train, Y_test = train_test_split(normalized_Y, test_size=0.2)
print('X_train: ' + str(np.count_nonzero(X_train)))
print('Y_train: ' + str(np.count_nonzero(Y_train)))
regr = LinearRegression()
regr.fit(X_train, Y_train)
I have added the four print() lines as I am getting a strange issue. The console print of these four lines is:
normalized_X: 4304
normalized_Y: 4338
X_train: 0
Y_train: 0
For some reason when I am splitting the data between training and testing data I get no values?
I get the following error on the regr.fit() line:
ValueError: Found array with 0 sample(s) (shape=(0, 4339)) while a minimum of 1 is required.
This tells me there is something wrong with the X values but I don't know what
UPDATE: Change to print(array.shape)
If I change my code to use
print('normalized_X: ' + str(normalized_X.shape))
print('normalized_Y: ' + str(normalized_Y.shape))
and this:
print('X_train: ' + str(X_train.shape))
print('Y_train: ' + str(Y_train.shape))
I get:
normalized_X: (1, 4339)
normalized_Y: (1, 4339)
and this:
X_train: (0, 4339)
Y_train: (0, 4339)
print (X_train)andprint (Y_train)to see what's insideprint(np.count_nonzero(array))would beprint(array.shape).count_nonzerowill flatten dimensions and ignore zero values - two features that are counterproductive here. shape is where a lot of tricky exciting things happen