WQ, WK, WV matrix used for generating query, key and value vector for Attention in Transformers are fixed or WQ, WK and WV are dependent on input word

268 views Asked by Vinay Sharma At 31 May 2023 at 04:05

To calculate self-attention, For each word, we create a Query vector, a Key vector, and a Value vector. These vectors are created by multiplying the embedding by three matrices that we trained during the training process defined as WQ, WK, WV matrix.

Question: are these matrices WQ, WK, WV same for every input word (embedding) or they are different for different different words?

Paper link

Original Q&A

TechQA.

WQ, WK, WV matrix used for generating query, key and value vector for Attention in Transformers are fixed or WQ, WK and WV are dependent on input word

There are 0 answers

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in ATTENTION-MODEL

Related Questions in SELF-ATTENTION

Related Questions in MULTIHEAD-ATTENTION

Popular Questions

Trending Questions