How to modify batch data to reflect changes in original data in dataloader/pytorch?

Ask Question

Asked 11 months ago

Modified 11 months ago

Viewed 32 times

I recently created a dataset class and am having trouble modifying the data in the batch so that it is reflected in future batches and original data

I have the following dataset class

class KrakenDataSet(Dataset):
    """Creates a tensor Xt as defined by equation 18 in paper to feed into ANN

    Parameters
    ----------
    Dataset : _type_
        Base class from torch utils class that allows to iterate over dataset
    """

    def __init__(
        self,
        portfolio: Portfolio,
        pvm: torch.tensor,
        window_size: int = 50,
        step_size: int = 1,
        device="mps",
    ):
        self.portfolio = portfolio
        self.window_size = window_size
        self.step_size = step_size
        self.close_pr = torch.tensor(
            self.portfolio.get_close_price().values[:, 1:], dtype=torch.float32
        ).to(device)
        self.high_pr = torch.tensor(
            self.portfolio.get_high_price().values[:, 1:], dtype=torch.float32
        ).to(device)
        self.low_pr = torch.tensor(
            self.portfolio.get_low_price().values[:, 1:], dtype=torch.float32
        ).to(device)
        self.pvm = pvm

    def __len__(self):
        return self.portfolio.nobs

    def __getitem__(self, idx):
        msecurities = self.portfolio.m_noncash_securities
        start = idx * self.step_size
        end = start + self.window_size
        xt = torch.zeros(3, msecurities, self.window_size)
        xt[0] = (self.close_pr[start:end:,] / self.close_pr[end - 1,]).T
        xt[1] = (self.high_pr[start:end:,] / self.close_pr[end - 1,]).T
        xt[2] = (self.low_pr[start:end:,] / self.close_pr[end - 1,]).T
        return xt, self.pvm[end - 2]

The purpose of this class is to take closing prices, high and low from last window_size and return the current xt. I have a pvm variable which holds all the weights.

When i do this:

pvm = (torch.ones(nobs, msecurities) / msecurities).to(device)
kraken_ds = KrakenDataSet(port, pvm, window_size=WINDOW_SIZE, device=device)

for xt, wt in kraken_ds:
    wt[0] = 5

im able to view the changes in pvm[48] as thats the index i want to make the change in

But the moment i put this into a batch, the same modification doeesn't propagate through the original pvm

kraken_dl = DataLoader(
    kraken_ds,
    batch_size=BATCH_SIZE,
    shuffle=False,
    drop_last=False,
    # generator=torch.Generator(device="mps"),
)

for xt, wt, in kraken_dl:
    wt[0][0] = 5

I searched along and im not sure if dataloader creates a deep copy of the dataset and hence im unable to modify the original dataset when i make modifications to the batch dataset? Can anyone suggest a workaround? Thanks

Changes made inside the batch of dataloader modifies the original dataset

asked Nov 30, 2024 at 21:02

rajan subramanian

11 bronze badge

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to modify batch data to reflect changes in original data in dataloader/pytorch?

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest