I think this question was already asked a few times but I am yet to find a good answer here.
So I have a Pytorch Dataset that is made from 2 numpy arrays.
The following are the dimensions.
features = [10000, 450, 28] numpy array. dim_0 = the number of sample, dim_1 = time series, dim_2 = features. Basically I have a data that is 450 frames long, where each frame contains 28 features and I have 10000 samples.
label = [10000,450] numpy array. dim_0 = number of samples, dim_1 = label per each frame.
The assignment is that I need to do a classification for each frame.
I created a Pytorch custom Dataset and Dataloader using the following function.
label_length = label.size
label = torch.from_numpy(label)
features = torch.from_numpy(features)
train_dataset = Dataset(label, features, label_length)
train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)
As expected, the train_dataloader.dataset.data returns a tensor of size [10000,450,28] Great! Now just need to take the batches from the 10000 sample and loop! So I run a code like below - assume that the optimizers/loss function are all set.
train_loss = 0
EPOCHS = 3
for epoch_idx in range(EPOCHS):
for i, data in enumerate(train_dataloader):
inputs, labels = data
print(inputs.size())
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
But I get this error:
ValueError: LSTM: Expected input to be 2D or 3D, got 4D instead
When I checked the dimension of inputs, it gave [64 x 10000 x 450 x 28]
Why does dataloader add this dimension of batch? (I understand per documentation it is supposed to do it, but I think it should take 64 samples out of 10000 and create batches and loop over each batch?
I think I am making a mistake somewhere but cannot pin point what I am doing wrong...
EDIT: This is my simple Dataset class
class Dataset(torch.utils.data.Dataset):
def __init__(self, label, data, length):
self.labels = label
self.data = data
self.length = length
def __len__(self):
return self.length
def __getitem__(self, idx):
# need to create tensor
#data = torch.from_numpy(self.data)
#labels = torch.from_numpy(self.labels).type(torch.LongTensor)
data = self.data
labels = self.labels
return data, labels