29 questions
0
votes
0
answers
34
views
Why does MATLAB selfAttentionLayer give different parameter counts for head/key-channel pairs with the same total key dimension?
I’m experimenting with the MathWorks example that inserts a multi-head self-attention layer into a simple CNN for the DigitDataset:
Link to example
layers = [
imageInputLayer([28 28 1])
...
2
votes
1
answer
183
views
How to Identify Similar Code Parts Using CodeBERT Embeddings?
I'm using CodeBERT to compare how similar two pieces of code are. For example:
# Code 1
def calculate_area(radius):
return 3.14 * radius * radius
# Code 2
def compute_circle_area(r):
return 3.14159 * ...
-4
votes
1
answer
123
views
Failing to Finalize Execution Plan Using cuDNN Backend to Create a Fused Attention fprop Graph
I am working on implementing the Fused Attention fprop graph pattern. As of now I am only combining two matrix multiplications, meaning g3 and g4 are empty. I believe I have also matched all the ...
2
votes
1
answer
208
views
tensorflow.keras.layers.MultiHeadAttention warning that query layer is destroying mask
I am building a transformer model using tensorflow==2.16.1 and one of the layers is a tensorflow.keras.layers.MultiHeadAttention layer.
I implement the attention layer in the TransformerBlock below:
# ...
1
vote
0
answers
69
views
multihead self-attention for sentiment analysis not accurate results
i am trying to implement a model for sentiment analysis in text data using self-attention. In this example, i am using multi-head attention but cannot be sure if the results are accurate or not. It ...
0
votes
0
answers
969
views
PyTorch Vision Transformer - How Visualise Attention Layers
I am trying to extract the attention map for a PyTorch implementation of the Vision Transformer (ViT). however, I am having trouble understanding how to do this. I understand that doing this from ...
1
vote
1
answer
313
views
PyTorch MultiHeadAttention implementation
In Pytorch's MultiHeadAttention implementation, regarding in_proj_weight, is it true that the first embed_dim elements correspond to the query, the next embed_dim elements correspond to the key, and ...
0
votes
1
answer
468
views
Adding an attention block in deep neural network issue for regression problem
I want to add an tf.keras.layers.MultiHeadAttention inside the two layers of neural network. However, I am getting IndexError:
The detailed code are as follow
x1 = Dense(58, activation='relu')(x1)
x1 =...
0
votes
0
answers
256
views
PyTorch RuntimeError: Invalid Shape During Reshaping for Multi-Head Attention
I'm implementing a multi-head self-attention mechanism in PyTorch which is part of Text2Image model that I am trying to build and I'm encountering a runtime error when trying to reshape the output of ...
2
votes
0
answers
192
views
What is the reason for MultiHeadAttention having a different call convention than Attention and AdditiveAttention?
Attention and AdditiveAttention are called with their input tensors in a list. (same as Add, Average, Concatenate, Dot, Maximum, Multiply, Subtract)
But MultiHeadAttention is called by passing the ...
0
votes
0
answers
191
views
How to convert Tensorflow Multi-head attention to PyTorch?
I'm converting a Tensorflow transformer model to Pytorch equivalent.
In TF multi-head attention part of the code I have:
att = layers.MultiHeadAttention(num_heads=6, key_dim=4)
and the input shape is [...
0
votes
0
answers
128
views
Exception encountered when calling layer 'tft_multi_head_attention' (type TFTMultiHeadAttention)
I am trying to build a forecasting model with tft module with Temporal Fusion Transformer,I am getting below error when I am trying to train the model, since I am new to tensorflow, I can't understand ...
0
votes
1
answer
236
views
Running speed of Pytorch MultiheadAttention compared to Torchvision MVit
I am currently experimenting with my model, which uses Torchvision implementation of MViT_v2_s as backbone. I added a few cross attention modules to the model which looks roughly like this:
class ...
1
vote
0
answers
226
views
How to access the value projection at MultiHeadAttention layer in Pytorch
I'm making an own implementation for the Graphormer architecture. Since this architecture needs to add an edge-based bias to the output for the key-query multiplication at the self-attention mechanism ...
1
vote
1
answer
824
views
Multi head Attention calculation
I create a model with a multi head attention layer,
import torch
import torch.nn as nn
query = torch.randn(2, 4)
key = torch.randn(2, 4)
value = torch.randn(2, 4)
model = nn.MultiheadAttention(4, 1, ...