Danielle Barth: "Quantitative Corpus Linguistics and Fieldwork Data", 30 October 2015

Australian National University, Shape

Date: 30 October 2015

When: 30 October, 11am-12.30pm

Where: Engma Room, Coombs Building, ANU

This talk discusses assumptions and methods in quantitative corpus linguistics (Gries, 2009) including exploratory data-mining techniques for pattern detection like random forests (Hothorn et al., 2006a; Hothorn et al., 2006b; Strobl et al., 2007; Strobl et al., 2008; Strobl et al., 2009). Although one-million words is traditionally used as the minimum number for a body of texts to be a “corpus” (Fang, 1993; Leech, 1991), there are plenty of research questions that can be investigated quantitatively using smaller “corpora” (Barth, 2015; Meyerhoff and Walker, 2012; 2013; Meyerhoff, 2015).

I will present two case studies using data from Matukar Panau (Oceanic, Papua New Guinea). The first case study will present a quantitative exploration of casual and clear lexical variants and situate the results in the sociolinguistic style and identity literature (Eckert, 2008; 2012; Pennebaker and Stone, 2003; Podesva, 2007; Zhang, 2008, inter alia). The second case study will present a quantitative exploration of directional construction type distribution in Matukar Panau and discuss language-internal motivations for choice of syntactic construction.

In this talk, I will advocate looking at variation in lesser studied languages and show that is possible before the 1 million word mark of a traditionally termed “corpus.”


