DPG Phi
Verhandlungen
Verhandlungen
DPG

Regensburg 2025 – scientific programme

Parts | Days | Selection | Search | Updates | Downloads | Help

AKPIK: Arbeitskreis Physik, moderne Informationstechnologie und Künstliche Intelligenz

AKPIK 2: Machine Learning Prediction and Optimization Tasks

AKPIK 2.1: Talk

Tuesday, March 18, 2025, 09:30–09:45, H5

Attention space geometry — •Claudius Gros — Institute for Theoretical Physics, Goethe University Frankfurt

Attention involves comparing query and key vectors in terms of a scalar product, Q·K, together with a subsequent softmax normalization. Classicaly, parallel/orthogonal/anti-parallel queries and keys lead to large/intermediate/small attention weights. Here we study expressive attention (EA), which is based on (Q·K)2, the squared dot product. In this case attention is enhanced when query and key are either parallel or anti-parallel, and suppressed for orthogonal configurations. For a series of auto-regressive prediction tasks, we find that EA performs at least as well as the standard mechanism, dot-product attention (DPA). Increasing task complexity, EA is observed to outperform DPA with increasing margins, which also holds for multi-task settings. For a given model size, EA manages to achieve 100% performance for a range of complexity levels not accessible to DPA.

Keywords: attention; transformer; time series predction

100% | Mobile Layout | Deutsche Version | Contact/Imprint/Privacy
DPG-Physik > DPG-Verhandlungen > 2025 > Regensburg