Dresden 2014 – wissenschaftliches Programm
Bereiche | Tage | Auswahl | Suche | Aktualisierungen | Downloads | Hilfe
SOE: Fachverband Physik sozio-ökonomischer Systeme
SOE 8: Focus Session: Complex Systems Approaches to Language and Communication
SOE 8.3: Vortrag
Dienstag, 1. April 2014, 11:00–11:15, GÖR 226
Topic models and scaling laws — •Martin Gerlach and Eduardo G. Altmann — Max Planck Institute for the Physics of Complex Systems, Dresden, Germany
In this talk we combine statistical analysis of large text databases and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies. We focus on the well studied case of the vocabulary growth with database size (Heaps' law) and on a novel scaling law we observe using fluctuation scaling analysis. In order to simultaneously explain both scaling laws we show that it is essential to account for the heterogeneity in the vocabulary of texts by considering topic models (e.g. Latent Dirichlet Allocation). Our models are tested against three different databases: Google n-gram database, Wikipedia, and all articles published by PLoS.