0

I recently created a dataset class and am having trouble modifying the data in the batch so that it is reflected in future batches and original data

I have the following dataset class

class KrakenDataSet(Dataset):
    """Creates a tensor Xt as defined by equation 18 in paper to feed into ANN

    Parameters
    ----------
    Dataset : _type_
        Base class from torch utils class that allows to iterate over dataset
    """

    def __init__(
        self,
        portfolio: Portfolio,
        pvm: torch.tensor,
        window_size: int = 50,
        step_size: int = 1,
        device="mps",
    ):
        self.portfolio = portfolio
        self.window_size = window_size
        self.step_size = step_size
        self.close_pr = torch.tensor(
            self.portfolio.get_close_price().values[:, 1:], dtype=torch.float32
        ).to(device)
        self.high_pr = torch.tensor(
            self.portfolio.get_high_price().values[:, 1:], dtype=torch.float32
        ).to(device)
        self.low_pr = torch.tensor(
            self.portfolio.get_low_price().values[:, 1:], dtype=torch.float32
        ).to(device)
        self.pvm = pvm

    def __len__(self):
        return self.portfolio.nobs

    def __getitem__(self, idx):
        msecurities = self.portfolio.m_noncash_securities
        start = idx * self.step_size
        end = start + self.window_size
        xt = torch.zeros(3, msecurities, self.window_size)
        xt[0] = (self.close_pr[start:end:,] / self.close_pr[end - 1,]).T
        xt[1] = (self.high_pr[start:end:,] / self.close_pr[end - 1,]).T
        xt[2] = (self.low_pr[start:end:,] / self.close_pr[end - 1,]).T
        return xt, self.pvm[end - 2]

The purpose of this class is to take closing prices, high and low from last window_size and return the current xt. I have a pvm variable which holds all the weights.

When i do this:

pvm = (torch.ones(nobs, msecurities) / msecurities).to(device)
kraken_ds = KrakenDataSet(port, pvm, window_size=WINDOW_SIZE, device=device)
for xt, wt in kraken_ds:
    wt[0] = 5

im able to view the changes in pvm[48] as thats the index i want to make the change in

But the moment i put this into a batch, the same modification doeesn't propagate through the original pvm

kraken_dl = DataLoader(
    kraken_ds,
    batch_size=BATCH_SIZE,
    shuffle=False,
    drop_last=False,
    # generator=torch.Generator(device="mps"),
)

for xt, wt, in kraken_dl:
    wt[0][0] = 5

I searched along and im not sure if dataloader creates a deep copy of the dataset and hence im unable to modify the original dataset when i make modifications to the batch dataset? Can anyone suggest a workaround? Thanks

Changes made inside the batch of dataloader modifies the original dataset

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.