Danielle Barth: "Quantitative Corpus Linguistics and Fieldwork Data", 30 October 2015
When: 30 October, 11am-12.30pm
Where: Engma Room, Coombs Building, ANU
This talk discusses assumptions and methods in quantitative corpus linguistics (Gries, 2009) including exploratory data-mining techniques for pattern detection like random forests (Hothorn et al., 2006a; Hothorn et al., 2006b; Strobl et al., 2007; Strobl et al., 2008; Strobl et al., 2009). Although one-million words is traditionally used as the minimum number for a body of texts to be a “corpus” (Fang, 1993; Leech, 1991), there are plenty of research questions that can be investigated quantitatively using smaller “corpora” (Barth, 2015; Meyerhoff and Walker, 2012; 2013; Meyerhoff, 2015).
I will present two case studies using data from Matukar Panau (Oceanic, Papua New Guinea). The first case study will present a quantitative exploration of casual and clear lexical variants and situate the results in the sociolinguistic style and identity literature (Eckert, 2008; 2012; Pennebaker and Stone, 2003; Podesva, 2007; Zhang, 2008, inter alia). The second case study will present a quantitative exploration of directional construction type distribution in Matukar Panau and discuss language-internal motivations for choice of syntactic construction.
In this talk, I will advocate looking at variation in lesser studied languages and show that is possible before the 1 million word mark of a traditionally termed “corpus.”
Barth, D. (2015). To have and to be: function word reduction in child speech, child directed speech and inter-adult speech (Doctoral dissertation, University of Oregon).
Eckert, P. (2008). Variation and the indexical field. Journal of Sociolinguistics, 12, 453-476.
Eckert, P. (2012). Three waves of variation study: The emergence of meaning in the study of sociolinguistic variation. Annual Review of Anthropology, 41, 87-100.
Fang, A. C. (1993). Building a corpus of the English of computer science. English Language Corpora: Design, Analysis and Exploitation. Amsterdam and Atlanta, GA: Rodopi, 73-8.
Gries, S. Th. 2009. Quantitative corpus linguistics with R: a practical introduction. Routledge, Taylor and Francis Group.
Hothorn, T., Buehlmann, P., Dudoit, S., Molinaro, A. & Van Der Laan, M. (2006). Survival ensembles. Biostatistcs, 7(3), 355-373.
Hothorn, T., Hornik, K. & Zeileis, A. (2006). Unbiased recursive partitioning: a conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651-674.
Leech, G. (1991). The state of the art in corpus linguistics. In Aijmer, K. and Altenberg, B. (eds.), English Corpus Linguistics: Studies in honour of Jan Svartvik. Longman, London, pp. 8 – 29.
Meyerhoff, M. (2015). Turning variation on its head: Analysing subject prefixes in Nkep (Vanuatu) for language documentation. Asia-Pacific Language Variation, 1(1), 78-108.
Meyerhoff, M., & Walker, J. A. (2013). An existential problem: The sociolinguistic monitor and variation in existential constructions on Bequia (St. Vincent and the Grenadines). Language in Society, 42 (4), 407-428.
Meyerhoff, M., & Walker, J. A. (2012). Grammatical variation in Bequia (St Vincent and the Grenadines). Journal of Pidgin and Creole Languages, 27(2), 209-234.
Pennebaker, J. W. and Stone, L. D. (2003). Words of wisdom: Language use over the life span. Journal of Personality and Social Psychology, 85(2), 291-301.
Podesva, R. J. (2007). Phonation type as a stylistic variable: the use of falsetto in constructing a persona. Journal of Sociolinguistics, 11, 478-504.