WQ, WK, WV matrix used for generating query, key and value vector for Attention in Transformers are fixed or WQ, WK and WV are dependent on input word

268 views Asked by At

To calculate self-attention, For each word, we create a Query vector, a Key vector, and a Value vector. These vectors are created by multiplying the embedding by three matrices that we trained during the training process defined as WQ, WK, WV matrix.

Question: are these matrices WQ, WK, WV same for every input word (embedding) or they are different for different different words?

Paper link

0

There are 0 answers