Regensburg 2025 – scientific programme

AKPIK 2.1: Talk

Tuesday, March 18, 2025, 09:30–09:45, H5

Attention space geometry — •Claudius Gros — Institute for Theoretical Physics, Goethe University Frankfurt

Attention involves comparing query and key vectors in terms of a scalar product, Q·K, together with a subsequent softmax normalization. Classicaly, parallel/orthogonal/anti-parallel queries and keys lead to large/intermediate/small attention weights. Here we study expressive attention (EA), which is based on (Q·K)², the squared dot product. In this case attention is enhanced when query and key are either parallel or anti-parallel, and suppressed for orthogonal configurations. For a series of auto-regressive prediction tasks, we find that EA performs at least as well as the standard mechanism, dot-product attention (DPA). Increasing task complexity, EA is observed to outperform DPA with increasing margins, which also holds for multi-task settings. For a given model size, EA manages to achieve 100% performance for a range of complexity levels not accessible to DPA.

Keywords: attention; transformer; time series predction

Regensburg 2025 – scientific programme

Parts | Days | Selection | Search | Updates | Downloads | Help

AKPIK: Arbeitskreis Physik, moderne Informationstechnologie und Künstliche Intelligenz

AKPIK 2: Machine Learning Prediction and Optimization Tasks

AKPIK 2.1: Talk

Tuesday, March 18, 2025, 09:30–09:45, H5