I am having issues understanding how X and y are referenced for training.
I have a simple csv file with 5 numeric columns that I am loading into a NumPy array as follows:
url = "http://www.xyz/shortDataFinal.data"
# download the file
raw_data = urllib.urlopen(url)
# load the CSV file as a numpy matrix
dataset = np.loadtxt(raw_data, delimiter=",")
print(dataset.shape)
# separate the data from the target attributes
X = dataset[:,0:3] #Does this mean columns 1-4?
y = dataset[:,4] #Is this the 5th column?
I think I am referencing my X values incorrectly.
Here is what I need:
X values reference columns 1-4 and my y value is the last column, which is the 5th. If I understand correctly, I should be referencing array indices 0:3 for the X values and number 4 for the y as I have done above. However, those values aren't correct. In other words, the values returned by the array don't match the values in the data - they are off by one column (index).
0:4(to get 4 columns).