5

A NN with first layer is fully connected and second is custom connection

As shown in the figure, it is a 3 layer with NN, namely input layer, hidden layer and output layer. I want to design the NN(in PyTorch, just the arch) where the input to hidden layer is fully-connected. However, from hidden layer to output, the first two neurons of the hidden layer should be connected to first neuron of the output layer, second two should be connected to the second in the output layer and so on. How shall this should be designed ?

from torch import nn
layer1 = nn.Linear(input_size, hidden_size)
layer2 = ??????

5 Answers 5

5

As @Jan said here, you can overload nn.Linear and provide a point-wise mask to mask the interaction you want to avoid having. Remember that a fully connected layer is merely a matrix multiplication with an optional additive bias.

Looking at its source code, we can do:

class MaskedLinear(nn.Linear):
    def __init__(self, *args, mask, **kwargs):
        super().__init__(*args, **kwargs)
        self.mask = mask

    def forward(self, input):
        return F.linear(input, self.weight, self.bias)*self.mask

Having F defined as torch.nn.functional

Considering the constraint you have given to the second layer:

the first two neurons of the hidden layer should be connected to the first neuron of the output layer

It seems you are looking for this pattern:

tensor([[1., 0., 0.],
        [1., 0., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 1.]])

Which can be obtained using torch.block_diag:

mask = torch.block_diag(*[torch.ones(2,1),]*output_size)

Having this, you can define your network as:

net = nn.Sequential(nn.Linear(input_size, hidden_size),
                    MaskedLinear(hidden_size, output_size, mask))

If you feel like it, you can even implement it inside the custom layer:

class LocalLinear(nn.Linear):
    def __init__(self, *args, kernel_size=2, **kwargs):
        super().__init__(*args, **kwargs)

        assert self.in_features == kernel_size*self.out_features
        self.mask = torch.block_diag(*[torch.ones(kernel_size,1),]*self.out_features)

def forward(self, input):
    return F.linear(input, self.weight, self.bias)*self.mask

And defining it like so:

net = nn.Sequential(nn.Linear(input_size, hidden_size),
                    LocalLinear(hidden_size, output_size))
Sign up to request clarification or add additional context in comments.

3 Comments

My input size is (batch_size, 100) and my mask is (100, 10), The line: out = F.linear(input*self.mask, self.weight, self.bias) throwing error: RuntimeError: The size of tensor a (100) must match the size of tensor b (10) at non-singleton dimension 1
You're right, there was an issue. The mask should be applied after the linear layer is infered on, not before. See my edit above.
This doesn't seem right. The weight matrix is the one that needs to be masked, not the output of weight*input + bias. After the multiply has happened, we cannot remove the unwanted interactions.
2

Instead of using nn.Linear directly, create a weights tensor weight and a mask tensor mask that masks those weights that you do not intend to use. Then you use torch.nn.functional.linear(input, weight * mask) (https://pytorch.org/docs/stable/generated/torch.nn.functional.linear.html) to forward the second layer. Note that this is implemented in your torch.nn.Module's forward function. The weight needs to be registered as a parameter to your nn.Module so that it's recognized by nn.Module.parameters(). See https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_parameter.

Comments

1

Ivan's general approach (masking the fully connected layer) may work with modifications as in my comment, but it adds a lot of useless computation!

It's probably best to write a custom layer here, with a weight matrix of shape (2, hidden_size//2). Then reshape the input to the layer from output of the hidden layer from (hidden_size) to (hidden_size//2, 2) and do the matrix multiply.

Something like this (untested):

class MyLayer(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.weight = torch.zeros(2, in_channels // 2)
        self.bias = torch.zeros(in_channels // 2)

    def forward(self, inp):
        return torch.matmul(inp.reshape(-1, inp.shape[-1]//2, 2), self.weight) + self.bias

1 Comment

You should register weight and bias as an nn.Parameter. How do you make sure the gradients are back propagated correctly? Does matmul automatically create a grad_fn?
0

This problem really intrigued me, so I created a small self-contained example in this repository. I also wrote a blog post on medium explaining the logic and providing an interesting use case for layer masking.

TLDR: the currently highest-ranked answer from @Ivan contains two problems:

First, as was pointed out in one of the comments, you need to apply the mask to the weight and only then add the bias to it:

def forward(self, x: torch.Tensor) -> torch.Tensor:
    x = torch.nn.functional.linear(x, self.weight * self.mask, self.bias)
    return x

Second, you have to be careful with the shape of your mask. Your mask needs to match the shape of "self.weight" which has the output neurons defined on the first axis. This makes sense if you think in terms of matrix multiplication y=Ax, where A is our weight matrix and x the input and y the output. So in the above example this would be:

tensor([[1., 1., 0., 0., 0., 0.],
        [0., 0., 1., 1., 0., 0.],
        [0., 0., 0., 0., 1., 1.]])

I hope this helps and make sure to check out the repo / blog post linked above.

Comments

0

In your specific case you can also use a nn.Conv1d layer with kernel_size=2 and stride=2 and a single channel to connect the hidden neurons two-by-two:

from torch import nn
layer1 = nn.Linear(in_features=20, out_features=6)
unflatten = nn.Unflatten(dim=1, unflattened_size=(1, 6))
layer2 = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=2, stride=2, bias=False)
flatten = nn.Flatten()
model = nn.Sequential(layer1, unflatten, layer2, flatten)

input = torch.randn((16, 20,))
print(model(input).shape)
# Output: torch.Size([16, 3])

Here, the unflatten layer will reshape the tensor from [batch_size, n_hidden_features] to [batch_size, 1, n_hidden_features], which corresponds to a batch of sequences of length n_hidden_features of 1 channel.

Note that to use this with non-batched inputs (of shape [20]), you would have to slightly change the unflatten and the flatten layers.

Compared to using a mask, I would expect the performance of this approach to be much better when the number of hidden features is large, as torch will automatically consider the sparsity of the convolutional operations.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.