Create custom connection/ non-fully connected layers in Pytorch

Question

As shown in the figure, it is a 3 layer with NN, namely input layer, hidden layer and output layer. I want to design the NN(in PyTorch, just the arch) where the input to hidden layer is fully-connected. However, from hidden layer to output, the first two neurons of the hidden layer should be connected to first neuron of the output layer, second two should be connected to the second in the output layer and so on. How shall this should be designed ?

from torch import nn
layer1 = nn.Linear(input_size, hidden_size)
layer2 = ??????

Ivan · Accepted Answer · 2022-06-25 17:57:31Z

5

As @Jan said here, you can overload nn.Linear and provide a point-wise mask to mask the interaction you want to avoid having. Remember that a fully connected layer is merely a matrix multiplication with an optional additive bias.

Looking at its source code, we can do:

class MaskedLinear(nn.Linear):
    def __init__(self, *args, mask, **kwargs):
        super().__init__(*args, **kwargs)
        self.mask = mask

    def forward(self, input):
        return F.linear(input, self.weight, self.bias)*self.mask

Having F defined as torch.nn.functional

Considering the constraint you have given to the second layer:

the first two neurons of the hidden layer should be connected to the first neuron of the output layer

It seems you are looking for this pattern:

tensor([[1., 0., 0.],
        [1., 0., 0.],
        [0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 1.]])

Which can be obtained using torch.block_diag:

mask = torch.block_diag(*[torch.ones(2,1),]*output_size)

Having this, you can define your network as:

net = nn.Sequential(nn.Linear(input_size, hidden_size),
                    MaskedLinear(hidden_size, output_size, mask))

If you feel like it, you can even implement it inside the custom layer:

class LocalLinear(nn.Linear):
    def __init__(self, *args, kernel_size=2, **kwargs):
        super().__init__(*args, **kwargs)

        assert self.in_features == kernel_size*self.out_features
        self.mask = torch.block_diag(*[torch.ones(kernel_size,1),]*self.out_features)

def forward(self, input):
    return F.linear(input, self.weight, self.bias)*self.mask

And defining it like so:

net = nn.Sequential(nn.Linear(input_size, hidden_size),
                    LocalLinear(hidden_size, output_size))

edited Jun 25, 2022 at 17:57

answered Jun 23, 2022 at 8:45

Ivan

41.3k9 gold badges78 silver badges120 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Swain Subrat Kumar Over a year ago

My input size is (batch_size, 100) and my mask is (100, 10), The line: out = F.linear(input*self.mask, self.weight, self.bias) throwing error: RuntimeError: The size of tensor a (100) must match the size of tensor b (10) at non-singleton dimension 1

Ivan Over a year ago

You're right, there was an issue. The mask should be applied after the linear layer is infered on, not before. See my edit above.

PBJ Over a year ago

This doesn't seem right. The weight matrix is the one that needs to be masked, not the output of weight*input + bias. After the multiply has happened, we cannot remove the unwanted interactions.

Jan · Accepted Answer · 2022-06-23 07:24:13Z

2

Instead of using nn.Linear directly, create a weights tensor weight and a mask tensor mask that masks those weights that you do not intend to use. Then you use torch.nn.functional.linear(input, weight * mask) (https://pytorch.org/docs/stable/generated/torch.nn.functional.linear.html) to forward the second layer. Note that this is implemented in your torch.nn.Module's forward function. The weight needs to be registered as a parameter to your nn.Module so that it's recognized by nn.Module.parameters(). See https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_parameter.

answered Jun 23, 2022 at 7:24

Jan

1,26014 silver badges29 bronze badges

Comments

PBJ · Accepted Answer · 2023-01-25 02:40:14Z

1

Ivan's general approach (masking the fully connected layer) may work with modifications as in my comment, but it adds a lot of useless computation!

It's probably best to write a custom layer here, with a weight matrix of shape (2, hidden_size//2). Then reshape the input to the layer from output of the hidden layer from (hidden_size) to (hidden_size//2, 2) and do the matrix multiply.

Something like this (untested):

class MyLayer(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.weight = torch.zeros(2, in_channels // 2)
        self.bias = torch.zeros(in_channels // 2)

    def forward(self, inp):
        return torch.matmul(inp.reshape(-1, inp.shape[-1]//2, 2), self.weight) + self.bias

answered Jan 25, 2023 at 2:40

PBJ

7678 silver badges21 bronze badges

1 Comment

Tim Kuipers Over a year ago

You should register weight and bias as an nn.Parameter. How do you make sure the gradients are back propagated correctly? Does matmul automatically create a grad_fn?

Jonas · Accepted Answer · 2025-06-15 11:17:38Z

This problem really intrigued me, so I created a small self-contained example in this repository. I also wrote a blog post on medium explaining the logic and providing an interesting use case for layer masking.

TLDR: the currently highest-ranked answer from @Ivan contains two problems:

First, as was pointed out in one of the comments, you need to apply the mask to the weight and only then add the bias to it:

def forward(self, x: torch.Tensor) -> torch.Tensor:
    x = torch.nn.functional.linear(x, self.weight * self.mask, self.bias)
    return x

Second, you have to be careful with the shape of your mask. Your mask needs to match the shape of "self.weight" which has the output neurons defined on the first axis. This makes sense if you think in terms of matrix multiplication y=Ax, where A is our weight matrix and x the input and y the output. So in the above example this would be:

tensor([[1., 1., 0., 0., 0., 0.],
        [0., 0., 1., 1., 0., 0.],
        [0., 0., 0., 0., 1., 1.]])

I hope this helps and make sure to check out the repo / blog post linked above.

Valérian Rey · Accepted Answer · 2025-06-15 12:56:30Z

In your specific case you can also use a nn.Conv1d layer with kernel_size=2 and stride=2 and a single channel to connect the hidden neurons two-by-two:

from torch import nn
layer1 = nn.Linear(in_features=20, out_features=6)
unflatten = nn.Unflatten(dim=1, unflattened_size=(1, 6))
layer2 = nn.Conv1d(in_channels=1, out_channels=1, kernel_size=2, stride=2, bias=False)
flatten = nn.Flatten()
model = nn.Sequential(layer1, unflatten, layer2, flatten)

input = torch.randn((16, 20,))
print(model(input).shape)
# Output: torch.Size([16, 3])

Here, the unflatten layer will reshape the tensor from [batch_size, n_hidden_features] to [batch_size, 1, n_hidden_features], which corresponds to a batch of sequences of length n_hidden_features of 1 channel.

Note that to use this with non-batched inputs (of shape [20]), you would have to slightly change the unflatten and the flatten layers.

Compared to using a mask, I would expect the performance of this approach to be much better when the number of hidden features is large, as torch will automatically consider the sparsity of the convolutional operations.

Collectives™ on Stack Overflow

Create custom connection/ non-fully connected layers in Pytorch

5 Answers 5

3 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related