1,098 questions
0
votes
0
answers
87
views
Torch example transformer with TransformerDecoder
In the torch example provided here https://github.com/pytorch/examples/tree/main/word_language_model, tansformer only uses torch.TransformerEncoder and torch.TransformerDecoder is overwritten with a ...
0
votes
0
answers
157
views
Why my Transformer model did not work well when dealing with single cell multi-omic data
The complete codes and data are available at:Google Disk
I'm working on a high-dimensional regression problem and have built a Transformer-based model in PyTorch. While the model trains, I'm observing ...
1
vote
1
answer
115
views
Can I use a custom attention layer while still leveraging a pre-trained BERT model?
In the paper “Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks”, they multiply a similarity matrix with the attention scores inside the attention layer. I want to ...
0
votes
1
answer
44
views
Multi-Head Self Attention in Transformer is permutation-invariant or equivariant how to see it in practice?
I read that a function f is equivariant if f(P(x)) = P(f(x)) where P is a permutation
So to check what means equivariant and permutation invariant I wrote the following code
import torch
import torch....
0
votes
0
answers
66
views
Why does adding token and positional embeddings in transformers work?
In transformer models, I've noticed that token embeddings and positional embeddings are added together before being passed into the attention layers:
import torch
import torch.nn as nn
class ...
0
votes
0
answers
96
views
Training and validation losses do not reduce when fine-tuning ViTPose from huggingface
I am trying to fine-tune a transformer/encoder based pose estimation model available here at:
https://huggingface.co/docs/transformers/en/model_doc/vitpose
When passing "labels" attribute to ...
2
votes
1
answer
84
views
Logits Don't Change in a Custom Reimplementation of a CLIP model [PyTorch]
The problem
The similarity scores are almost the same for texts that describe both a photo of a cat and a dog (the photo is of a cat).
Cat similarity: tensor([[-3.5724]], grad_fn=<MulBackward0>)
...
0
votes
0
answers
78
views
SageMaker Real-Time Endpoint Timeout Issues with Lambda for Parallel Data Processing
I’m new to AWS and struggling with an architecture involving AWS Lambda and a SageMaker real-time endpoint. I’m trying to process large batches of data rows efficiently, but I’m running into timeout ...
0
votes
0
answers
41
views
PyTorch Transformer Stuck in Local Minima Occasionally
I am working on a project to pre-train a custom transformer model I developed and then fine-tune it for a downstream task. I am pre-training the model on an H100 cluster and this is working great. ...
0
votes
0
answers
164
views
How do I resolve ImportError Using bitsandbytes 4bit quantization requires the latest version of bitsandbytes despite having version 0.45.3 installed?
I am trying to use the bitsandbytes library for 4-bit quantization in my model loading function, but I keep encountering an ImportError. The error message says, "Using bitsandbytes 4-bit ...
0
votes
0
answers
34
views
How to change last layer in finetuned model?
When I fine-tuned the model Hubert to detect phoneme, I chose a fine-tuned ASR Hubert model and I removed the last two layers and added a linear layer to the config vocab_size of phoneme. What is ...
0
votes
1
answer
181
views
Trouble understanding the formula for estimating dense self-attention FLOPS per Token given as 6LH(2QT)
In the appendix B of the PaLM paper (https://arxiv.org/pdf/2204.02311) it describes a metric called "Model Flops Utilization (MFU)" and the formula for estimating it. It's computation makes ...
2
votes
1
answer
457
views
What to do when the gradient explodes in a Transformer model?
General question (hopefully useful for people coming from google): What to do when the gradient explodes? When working with transformers and deep NNs (with PyTorch), do you have a mental checklist of ...
2
votes
0
answers
316
views
Timestamps reset every 30 seconds when using distil-whisper with return_timestamps=True
Problem
distil-large-v3#sequential-long-form
I'm using distil-whisper through the 🤗 Transformers pipeline for speech recognition. When setting return_timestamps=True, the timestamps reset to 0 every ...
0
votes
0
answers
12
views
Reverse Mapping of Table Elements from screenshot | Table Transformer
I am working on an end-to-end (E2E) project for websites that involves:
Capturing Tight Screenshots of Data Tables: The project automatically detects and takes precise screenshots of all the data ...
1
vote
1
answer
449
views
ValueError: Exception encountered when calling layer 'tf_bert_model' (type TFBertModel)
I have been trying to run TFBertModel from Transformers, but it kept on throwing me this error
ValueError Traceback (most recent call last)
Cell In[9], line 1
----> 1 ...
0
votes
1
answer
681
views
How to correctly apply LayerNorm after MultiheadAttention with different input shapes (batch_first vs default) in PyTorch?
I’m working on an audio recognition task using a Transformer-based model in PyTorch. My input features are generated by a CNN-based embedding layer and have the shape [batch_size, d_model, n_token], ...
0
votes
0
answers
191
views
Fine-Tune Tacotron2 and Waveglow Pre-train Model from Nvidia Tacotron and Waveglow Models
Does anyone knows how to fine tuned tacotron2 and waveglow model from nvida tacotron and waveglow pre-trained model?
first thing i was create my own dataset where same with format from the ljspeech ...
0
votes
1
answer
49
views
Compare two consecutive rows in datastage and throw the rows that doesn't meet a condition
I'm reading a file using a sequential file in Datastage and I'm doing some transformation in the data using a transformer, I want to compare the current row with the previous row, to check a value of ...
0
votes
0
answers
41
views
Missing a required argument: 'dec_input' in Transformer Model
I am busy with a forecasting model, and have turned to Transformers to see if they will be able to perform better than other sequence models.
I keep getting the error:
TypeError ...
0
votes
1
answer
36
views
Unable to figure out the hardware requirement(Cloud or on-prem) for open source inference for multiple users
I am trying to budget for setting up a llm based RAG application which will serve users with dynamic size(Anything from 100 to 2000).
I am able to figure out the GPU requirement to host a certain llm[...
1
vote
0
answers
34
views
pytorch quantized linear function gives shape invalid error
I am trying to implement write a simple quantized tensor linear multiplication. Assuming the weight matrix w3 of shape (14336, 4096) and the input tensor x of shape (2, 512, 4096) where first dim is ...
0
votes
1
answer
102
views
KV caching for varying length texts
I am trying to do some strucutured text extraction using some kv caching tricks. For this example I will use the following model and data:
model_name = "Qwen/Qwen2.5-0.5B-Instruct"
model = ...
0
votes
1
answer
98
views
Tensorflow executes slow on GPU - retracing issue?
I am trying to develop a transformer sequence to vector model but encounter performance issues.
I am working with a Tesla V100-PCIE-16GB. Whenever the model encounters an unseen sequence length, the (...
0
votes
1
answer
261
views
Exploding Gradient (NaN Training Loss And Validation Loss) In Multi Head Self Attention - Vision Transformer
This multihead self attention code causes the training loss and validation loss to become NaN, but when I remove this part, everything goes back to normal. I know that when the training loss and ...