Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
34 views

I’m experimenting with the MathWorks example that inserts a multi-head self-attention layer into a simple CNN for the DigitDataset: Link to example layers = [ imageInputLayer([28 28 1]) ...
Hend mahmoud's user avatar
2 votes
1 answer
183 views

I'm using CodeBERT to compare how similar two pieces of code are. For example: # Code 1 def calculate_area(radius): return 3.14 * radius * radius # Code 2 def compute_circle_area(r): return 3.14159 * ...
Nep's user avatar
  • 21
-4 votes
1 answer
123 views

I am working on implementing the Fused Attention fprop graph pattern. As of now I am only combining two matrix multiplications, meaning g3 and g4 are empty. I believe I have also matched all the ...
BigWinnz101's user avatar
2 votes
1 answer
208 views

I am building a transformer model using tensorflow==2.16.1 and one of the layers is a tensorflow.keras.layers.MultiHeadAttention layer. I implement the attention layer in the TransformerBlock below: # ...
Stod's user avatar
  • 83
1 vote
0 answers
69 views

i am trying to implement a model for sentiment analysis in text data using self-attention. In this example, i am using multi-head attention but cannot be sure if the results are accurate or not. It ...
phd Mom's user avatar
  • 11
0 votes
0 answers
969 views

I am trying to extract the attention map for a PyTorch implementation of the Vision Transformer (ViT). however, I am having trouble understanding how to do this. I understand that doing this from ...
Peter's user avatar
  • 9
1 vote
1 answer
313 views

In Pytorch's MultiHeadAttention implementation, regarding in_proj_weight, is it true that the first embed_dim elements correspond to the query, the next embed_dim elements correspond to the key, and ...
carpet119's user avatar
0 votes
1 answer
468 views

I want to add an tf.keras.layers.MultiHeadAttention inside the two layers of neural network. However, I am getting IndexError: The detailed code are as follow x1 = Dense(58, activation='relu')(x1) x1 =...
Zeshan Akber's user avatar
0 votes
0 answers
256 views

I'm implementing a multi-head self-attention mechanism in PyTorch which is part of Text2Image model that I am trying to build and I'm encountering a runtime error when trying to reshape the output of ...
venkatesh's user avatar
  • 162
2 votes
0 answers
192 views

Attention and AdditiveAttention are called with their input tensors in a list. (same as Add, Average, Concatenate, Dot, Maximum, Multiply, Subtract) But MultiHeadAttention is called by passing the ...
Tobias Hermann's user avatar
0 votes
0 answers
191 views

I'm converting a Tensorflow transformer model to Pytorch equivalent. In TF multi-head attention part of the code I have: att = layers.MultiHeadAttention(num_heads=6, key_dim=4) and the input shape is [...
ORC's user avatar
  • 18
0 votes
0 answers
128 views

I am trying to build a forecasting model with tft module with Temporal Fusion Transformer,I am getting below error when I am trying to train the model, since I am new to tensorflow, I can't understand ...
Navneet's user avatar
0 votes
1 answer
236 views

I am currently experimenting with my model, which uses Torchvision implementation of MViT_v2_s as backbone. I added a few cross attention modules to the model which looks roughly like this: class ...
whz's user avatar
  • 11
1 vote
0 answers
226 views

I'm making an own implementation for the Graphormer architecture. Since this architecture needs to add an edge-based bias to the output for the key-query multiplication at the self-attention mechanism ...
Angelo's user avatar
  • 665
1 vote
1 answer
824 views

I create a model with a multi head attention layer, import torch import torch.nn as nn query = torch.randn(2, 4) key = torch.randn(2, 4) value = torch.randn(2, 4) model = nn.MultiheadAttention(4, 1, ...
apostofes's user avatar
  • 3,833