Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
87 views

In the torch example provided here https://github.com/pytorch/examples/tree/main/word_language_model, tansformer only uses torch.TransformerEncoder and torch.TransformerDecoder is overwritten with a ...
cuneyttyler's user avatar
  • 1,395
0 votes
0 answers
157 views

The complete codes and data are available at:Google Disk I'm working on a high-dimensional regression problem and have built a Transformer-based model in PyTorch. While the model trains, I'm observing ...
氢氰酸's user avatar
1 vote
1 answer
115 views

In the paper “Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks”, they multiply a similarity matrix with the attention scores inside the attention layer. I want to ...
Blockchain Kid's user avatar
0 votes
1 answer
44 views

I read that a function f is equivariant if f(P(x)) = P(f(x)) where P is a permutation So to check what means equivariant and permutation invariant I wrote the following code import torch import torch....
fenaux's user avatar
  • 47
0 votes
0 answers
66 views

In transformer models, I've noticed that token embeddings and positional embeddings are added together before being passed into the attention layers: import torch import torch.nn as nn class ...
Yilmaz's user avatar
  • 51k
0 votes
0 answers
96 views

I am trying to fine-tune a transformer/encoder based pose estimation model available here at: https://huggingface.co/docs/transformers/en/model_doc/vitpose When passing "labels" attribute to ...
Soham Bhaumik's user avatar
2 votes
1 answer
84 views

The problem The similarity scores are almost the same for texts that describe both a photo of a cat and a dog (the photo is of a cat). Cat similarity: tensor([[-3.5724]], grad_fn=<MulBackward0>) ...
Yousef's user avatar
  • 51
0 votes
0 answers
78 views

I’m new to AWS and struggling with an architecture involving AWS Lambda and a SageMaker real-time endpoint. I’m trying to process large batches of data rows efficiently, but I’m running into timeout ...
Kabir Juneja's user avatar
0 votes
0 answers
41 views

I am working on a project to pre-train a custom transformer model I developed and then fine-tune it for a downstream task. I am pre-training the model on an H100 cluster and this is working great. ...
Martin Weiss's user avatar
0 votes
0 answers
164 views

I am trying to use the bitsandbytes library for 4-bit quantization in my model loading function, but I keep encountering an ImportError. The error message says, "Using bitsandbytes 4-bit ...
from's user avatar
  • 1
0 votes
0 answers
34 views

When I fine-tuned the model Hubert to detect phoneme, I chose a fine-tuned ASR Hubert model and I removed the last two layers and added a linear layer to the config vocab_size of phoneme. What is ...
Ngoc Anh's user avatar
0 votes
1 answer
181 views

In the appendix B of the PaLM paper (https://arxiv.org/pdf/2204.02311) it describes a metric called "Model Flops Utilization (MFU)" and the formula for estimating it. It's computation makes ...
cangozpi's user avatar
  • 159
2 votes
1 answer
457 views

General question (hopefully useful for people coming from google): What to do when the gradient explodes? When working with transformers and deep NNs (with PyTorch), do you have a mental checklist of ...
Nicholas Kryger-Nelson's user avatar
2 votes
0 answers
316 views

Problem distil-large-v3#sequential-long-form I'm using distil-whisper through the 🤗 Transformers pipeline for speech recognition. When setting return_timestamps=True, the timestamps reset to 0 every ...
Martin Zhu's user avatar
0 votes
0 answers
12 views

I am working on an end-to-end (E2E) project for websites that involves: Capturing Tight Screenshots of Data Tables: The project automatically detects and takes precise screenshots of all the data ...
Michael Dzwinel's user avatar
1 vote
1 answer
449 views

I have been trying to run TFBertModel from Transformers, but it kept on throwing me this error ValueError Traceback (most recent call last) Cell In[9], line 1 ----> 1 ...
Faiz khan's user avatar
0 votes
1 answer
681 views

I’m working on an audio recognition task using a Transformer-based model in PyTorch. My input features are generated by a CNN-based embedding layer and have the shape [batch_size, d_model, n_token], ...
MuxAte's user avatar
  • 43
0 votes
0 answers
191 views

Does anyone knows how to fine tuned tacotron2 and waveglow model from nvida tacotron and waveglow pre-trained model? first thing i was create my own dataset where same with format from the ljspeech ...
Izukishi's user avatar
0 votes
1 answer
49 views

I'm reading a file using a sequential file in Datastage and I'm doing some transformation in the data using a transformer, I want to compare the current row with the previous row, to check a value of ...
Chaimaa Emily's user avatar
0 votes
0 answers
41 views

I am busy with a forecasting model, and have turned to Transformers to see if they will be able to perform better than other sequence models. I keep getting the error: TypeError ...
Tayla Corney's user avatar
0 votes
1 answer
36 views

I am trying to budget for setting up a llm based RAG application which will serve users with dynamic size(Anything from 100 to 2000). I am able to figure out the GPU requirement to host a certain llm[...
Bing's user avatar
  • 631
1 vote
0 answers
34 views

I am trying to implement write a simple quantized tensor linear multiplication. Assuming the weight matrix w3 of shape (14336, 4096) and the input tensor x of shape (2, 512, 4096) where first dim is ...
hafezmg48's user avatar
0 votes
1 answer
102 views

I am trying to do some strucutured text extraction using some kv caching tricks. For this example I will use the following model and data: model_name = "Qwen/Qwen2.5-0.5B-Instruct" model = ...
sachinruk's user avatar
  • 10k
0 votes
1 answer
98 views

I am trying to develop a transformer sequence to vector model but encounter performance issues. I am working with a Tesla V100-PCIE-16GB. Whenever the model encounters an unseen sequence length, the (...
D. E.'s user avatar
  • 1
0 votes
1 answer
261 views

This multihead self attention code causes the training loss and validation loss to become NaN, but when I remove this part, everything goes back to normal. I know that when the training loss and ...
Fuji's user avatar
  • 117

1
2 3 4 5
22