PyTorch MultiHeadAttention implementation

Question

In Pytorch's MultiHeadAttention implementation, regarding in_proj_weight, is it true that the first embed_dim elements correspond to the query, the next embed_dim elements correspond to the key, and the final embed_dim elements correspond to the value? Just confirming.

This is a question asked in the same context, but doesn't answer my specific question

Karl · Accepted Answer · 2024-02-16 03:25:52Z

2

Yes, that is the case.

You can see how in_proj_weight is used in the _in_projection_packed function

projection weights for q, k and v, packed into a single tensor. Weights
are packed along dimension 0, in q, k, v order.

answered Feb 16, 2024 at 3:25

Karl

5,9661 gold badge11 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

PyTorch MultiHeadAttention implementation

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related