Date(s) - 16/10/2017
11:00 am - 12:00 pm
Studio Villa Bosch
Prof. Dr. Gerhard Jäger, University of Tübingen, Institute for Linguistics
From Words to Features to Trees: Computing a World Tree of Languages from Word Lists
Since over 200 years, historical linguists strive to reconstruct family trees of human languages using systematic comparisons of vocabulary and grammar of extant or documented languages. Since about 20 years, these efforts are complemented by computational approaches, deploying phylogenetic inference algorithms from computational biology to analyse language data.
So far, both lines of research have been confined to individual language families, i.e., phylogenetic units with a time depth of at most 10,000 years.
In this talk I will present and discuss a workflow that starts out from unannotated word lists from ca. 6,000 languages and dialects across the world. Using feature extraction techniques from machine learning, a feature matrix is extracted which in turn serves as input for Maximum-Likelihood phylogenetic inference (using the software RAxML). This leads to a phylogenetic tree over those languages and dialects, which is in very good agreement with expert classifications, correlates well with anthropological and genetic data, and also reveals some interesting deeper signals.
Gerhard Jäger (http://www.sfs.uni-tuebingen.de/~gjaeger/) is professor of General Linguistics at Tübingen University. He received his PhD and habilitation from Humboldt University at Berlin and held previous positions at Munich, UPenn, Utrecht and Stanford. He is PI of an ERC Advanced Grant “Language Evolution: The Empirical Turn” and co-PI of the interdisciplinary DFG-Research Unit “Words, Bones, Genes, Tools: Tracking Linguistic, Cultural and Biological Trajectories of the Human Past”.
His research interests include computational historical linguistics and game-theoretic pragmatics.
For registration please contact Benedicta Frech: email@example.com