I recently created a dataset class and am having trouble modifying the data in the batch so that it is reflected in future batches and original data
I have the following dataset class
class KrakenDataSet(Dataset):
"""Creates a tensor Xt as defined by equation 18 in paper to feed into ANN
Parameters
----------
Dataset : _type_
Base class from torch utils class that allows to iterate over dataset
"""
def __init__(
self,
portfolio: Portfolio,
pvm: torch.tensor,
window_size: int = 50,
step_size: int = 1,
device="mps",
):
self.portfolio = portfolio
self.window_size = window_size
self.step_size = step_size
self.close_pr = torch.tensor(
self.portfolio.get_close_price().values[:, 1:], dtype=torch.float32
).to(device)
self.high_pr = torch.tensor(
self.portfolio.get_high_price().values[:, 1:], dtype=torch.float32
).to(device)
self.low_pr = torch.tensor(
self.portfolio.get_low_price().values[:, 1:], dtype=torch.float32
).to(device)
self.pvm = pvm
def __len__(self):
return self.portfolio.nobs
def __getitem__(self, idx):
msecurities = self.portfolio.m_noncash_securities
start = idx * self.step_size
end = start + self.window_size
xt = torch.zeros(3, msecurities, self.window_size)
xt[0] = (self.close_pr[start:end:,] / self.close_pr[end - 1,]).T
xt[1] = (self.high_pr[start:end:,] / self.close_pr[end - 1,]).T
xt[2] = (self.low_pr[start:end:,] / self.close_pr[end - 1,]).T
return xt, self.pvm[end - 2]
The purpose of this class is to take closing prices, high and low from last window_size and return the current xt. I have a pvm variable which holds all the weights.
When i do this:
pvm = (torch.ones(nobs, msecurities) / msecurities).to(device)
kraken_ds = KrakenDataSet(port, pvm, window_size=WINDOW_SIZE, device=device)
for xt, wt in kraken_ds:
wt[0] = 5
im able to view the changes in pvm[48] as thats the index i want to make the change in
But the moment i put this into a batch, the same modification doeesn't propagate through the original pvm
kraken_dl = DataLoader(
kraken_ds,
batch_size=BATCH_SIZE,
shuffle=False,
drop_last=False,
# generator=torch.Generator(device="mps"),
)
for xt, wt, in kraken_dl:
wt[0][0] = 5
I searched along and im not sure if dataloader creates a deep copy of the dataset and hence im unable to modify the original dataset when i make modifications to the batch dataset? Can anyone suggest a workaround? Thanks
Changes made inside the batch of dataloader modifies the original dataset