Berlin 2012 – scientific programme
Parts | Days | Selection | Search | Updates | Downloads | Help
SOE: Fachverband Physik sozio-ökonomischer Systeme
SOE 13: Communication and Language
SOE 13.5: Talk
Wednesday, March 28, 2012, 15:00–15:15, H 0110
Burstiness and long-range correlation in natural language — •Eduardo G. Altmann — Max Planck Institute for the Physics of Complex Systems
Recent temporal analysis of different large-scale databases of human activities show that two ubiquitous patterns are the intermittency in the occurrence of events (burstiness) and correlations on arbitrarily long times. Natural language is a prominent human activity that not only creates these temporal patterns but also reproduces the patterns of external events. Here we perform a detailed analysis of the burstiness and correlations of literary texts. We show how these two phenomena relate to each other on different linguistic scales. In particular, we explain the correlations observed in different low-level encodings (ASCII,letters, vowels, etc.) by tracing their origin to the burstiness of specific words. We discuss how this burstiness depends on the semantics of the words and on the authors of the texts, and can be used in practical applications such as document classification and authorship recognition. Our framework of analysis is general and can be applied also to other hierarchical systems.