Sean Roberts presenting on "Tools for comparative studies of diversity", 12 August 2015

Australian National University, Evolution

Date: 12 August 2015

When: 4pm, 12 August

Where: Engma Room, Coombs Building, The Australian National University

There are two basic problems to explaining why some communities have more linguistic diversity than others. First, how to measure diversity in a way which is suitable for comparison across cultures. Secondly, how to identify causes and corollaries when there are so many variables and many are correlated with each other.

The concept of linguistic diversity is tricky. Even seemingly straightforward measures such as the number of languages a person or commnuity speaks often involves the particular political or historical context of the speech communities. This makes it difficult to compare communities. I'll review arguments that languages are not discrete, monolithic entities that can be counted, especially when looking at long-term diachronic change.  In response, I'll suggest that diversity can be measured as the amount of variation in one speaker's speech which can be explained by who they are talking to.

Once diversity is measured, there is still the problem of explaining differences.  I'll cover two statistical methods which might help.  The first is a machine learning method called random forests.  This identifies important factors which explain variation in a dependent variable.  The advantages are that it displays results in a way which is easy to interpret as rule-like, and also that it handles colinearity and small sample sizes well.

But what if you want to explore a network of interacting variables?  The second method involves causal graph inference.  Given some observational data, this method produces the most likely graph of causal connections between variables.  This method also produces easy to understand results and is robust to colinearity.

Rather than a detailed explanation of the mathematical workings of these models, I hope to engage people in discussion about the general principles and potential benefits of these approaches.


Sean Roberts looks at how individual cognition and interaction are related to population-level phenomena. He also looks at how large-scale, cross-cultural statistics are used to motivate research. His first language is Welsh.

He has an MA in Linguistics and Artificial Intelligence and an MSc in the Evolution of Language from the University of Edinburgh. He wrote his PhD on evolutionary approaches to bilingualism at the Language Evolution and Computation research unit.

He blog about language evolution at Replicated Typo .

