Bereiche | Tage | Auswahl | Suche | Aktualisierungen | Downloads | Hilfe
AKPIK: Arbeitskreis Physik, moderne Informationstechnologie und Künstliche Intelligenz
AKPIK 2: Machine Learning Prediction and Optimization Tasks
AKPIK 2.1: Vortrag
Dienstag, 18. März 2025, 09:30–09:45, H5
Attention space geometry — •Claudius Gros — Institute for Theoretical Physics, Goethe University Frankfurt
Attention involves comparing query and key vectors in terms of a scalar product, Q·K, together with a subsequent softmax normalization. Classicaly, parallel/orthogonal/anti-parallel queries and keys lead to large/intermediate/small attention weights. Here we study expressive attention (EA), which is based on (Q·K)2, the squared dot product. In this case attention is enhanced when query and key are either parallel or anti-parallel, and suppressed for orthogonal configurations. For a series of auto-regressive prediction tasks, we find that EA performs at least as well as the standard mechanism, dot-product attention (DPA). Increasing task complexity, EA is observed to outperform DPA with increasing margins, which also holds for multi-task settings. For a given model size, EA manages to achieve 100% performance for a range of complexity levels not accessible to DPA.
Keywords: attention; transformer; time series predction