Implementing dropout with pytorch

Question

I wonder if I want to implement dropout by myself, is something like the following sufficient (taken from Implementing dropout from scratch):

class MyDropout(nn.Module):
    def __init__(self, p: float = 0.5):
        super(MyDropout, self).__init__()
        if p < 0 or p > 1:
            raise ValueError("dropout probability has to be between 0 and 1, " "but got {}".format(p))
        self.p = p

    def forward(self, X):
        if self.training:
            binomial = torch.distributions.binomial.Binomial(probs=1-self.p)
            return X * binomial.sample(X.size()) * (1.0/(1-self.p))
        return X

My concern is even if the unwanted weights are masked out (either through this way or by using a mask tensor), there can still be gradient flow through the 0 weights (https://discuss.pytorch.org/t/custom-connections-in-neural-network-layers/3027/9). Is my concern valid?

Shai · Accepted Answer · 2021-08-03 06:59:38Z

1

DropOut does not mask the weights - it masks the features.
For linear layers implementing y = <w, x> the gradient w.r.t the parameters w is x. Therefore, if you set entries in x to zero - it will amount to no update for the corresponding weight in the adjacent linear layer.

answered Aug 3, 2021 at 6:59

Shai

115k39 gold badges259 silver badges398 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Sam Over a year ago

I see. So is there a convenient way of doing that? Apparently I need to do a conditional masking such that when x is multiplied with different rows of w (considering input to first hidden layer), for those unwanted forward pass I need to ask them. It seems this will break the simple expression like x = F.relu(self.fc(x)) (note: what I want is somewhat different from the standard dropout. I want to 0 mask certain specified forward pass deterministically.

Shai Over a year ago

@Zzy1130 the math of derivatives and chain rule does it for you.

Sam Over a year ago

Yes now I can see why with x being 0 when multiplying with certain row of w it wont update those parameters, but how to prevent x from multiplying with only certain rows of w? Math (matrix multiplication) itself dictates x to multiply with all rows of w.

Sam Over a year ago

after referring to the formula in the first link I cited, I see the masking is done afterwards. Thanks for ur reply.

Collectives™ on Stack Overflow

Implementing dropout with pytorch

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related