Domanda di colloquio di AMD

How does the self attention layer work in transformers?