DPG Phi
Verhandlungen
Verhandlungen
DPG

Dresden 2014 – scientific programme

Parts | Days | Selection | Search | Updates | Downloads | Help

SOE: Fachverband Physik sozio-ökonomischer Systeme

SOE 8: Focus Session: Complex Systems Approaches to Language and Communication

SOE 8.3: Talk

Tuesday, April 1, 2014, 11:00–11:15, GÖR 226

Topic models and scaling laws — •Martin Gerlach and Eduardo G. Altmann — Max Planck Institute for the Physics of Complex Systems, Dresden, Germany

In this talk we combine statistical analysis of large text databases and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies. We focus on the well studied case of the vocabulary growth with database size (Heaps' law) and on a novel scaling law we observe using fluctuation scaling analysis. In order to simultaneously explain both scaling laws we show that it is essential to account for the heterogeneity in the vocabulary of texts by considering topic models (e.g. Latent Dirichlet Allocation). Our models are tested against three different databases: Google n-gram database, Wikipedia, and all articles published by PLoS.

100% | Mobile Layout | Deutsche Version | Contact/Imprint/Privacy
DPG-Physik > DPG-Verhandlungen > 2014 > Dresden