what's the difference between "self-attention mechanism" and "full-connection" layer?

Question

what's the difference between "self-attention mechanism" and "full-connection" layer?

2.3k views Asked by tom_cat At 06 October 2020 at 02:50

I am confused with these two structures. In theory, the output of them are all connected to their input. what magic make 'self-attention mechanism' is more powerful than the full-connection layer?

Original Q&A

There are 1 answers

**hkchengrex** · Accepted Answer · 2020-10-06T03:33:54+00:00

Ignoring details like normalization, biases, and such, fully connected networks are fixed-weights:

f(x) = (Wx)

where W is learned in training, and fixed in inference.

Self-attention layers are dynamic, changing the weight as it goes:

attn(x) = (Wx)
f(x) = (attn(x) * x)

Again this is ignoring a lot of details but there are many different implementations for different applications and you should really check a paper for that.

TechQA.

what's the difference between "self-attention mechanism" and "full-connection" layer?

There are 1 answers

Related Questions in PYTORCH

Related Questions in BERT-LANGUAGE-MODEL

Related Questions in TRANSFORMER-MODEL

Popular Questions

Popular Tags

Trending Questions