Friday, 10 December 2021: Morning, 9.00 - 12.30
- Information theory and human language
Instructors: John Mansfield, Charles Kemp
Information theory has become important in many scientific disciplines, but its foundational questions are closely linked to linguistics: How can we quantify amounts of information? How do information quantities change according to context? How is information reflected in the length of symbolic codes, and what makes a coding system efficient?
Over the last decade or so, information-theoretic linguistics has been flourishing. Information (also known as surprisal, entropy, or unpredictability) gives us new ways of understanding phonology, morphology, syntax and semantics. Quantitative studies have revealed remarkable commonalities among languages of the world, while also highlighting the diverse structural solutions that meet the need for communicative efficiency. Information theory binds together the four COEDL themes of design space, cognitive processing, learnability and evolutionary process.
This course will provide a whirlwind tour of information-theoretic linguistics, starting from first principles, and explaining the remarkably simple maths behind the fancy symbols. Attendees will come away with a fresh perspective on human language, and a new set of tools to apply in their own research.
- Studying variation and change in spontaneous speech corpora
Instructors: Catherine Travis, Benjamin Purser
In this hands-on workshop, participants will learn how to conduct corpus searches and extract data for analysis using the corpus-management tool, LaBB-CAT. We will work with the Sydney Speaks corpus, a sociolinguistic corpus of around 1 million words from over 200 speakers of Australian English, compiled with the support of CoEDL.
LaBB-CAT is an open-source, browser-based corpus management tool — developed by the New Zealand Institute of Language, Brain and Behaviour — that stores audio (and/or video) recordings with time-aligned transcription. It is widely used as a forced aligner, but this workshop focuses on its use as a concordance program, for searching text or regular expressions, and downloading the corresponding transcription and audio segments, for coding and analysis.
In the first half of the workshop, you will be taken through searches for different kinds of variables (phonetic, morphosyntactic, discourse), and in the second half, you will get to know the tool better by conducting guided searches of your own.
Friday, 10 December 2021: Afternoon, 1.30 - 5.00
- How to start collecting child language acquisition data and what to do with it
Instructors: Lucy Davidson, Rebecca Defina, Barbara Kelly, Evan Kidd
A comprehensive theory of language acquisition must explain how infants can learn any one of the world’s 7000 or so languages. However, theories of language and cognitive development are grounded on data from only 1-2% of the world’s languages, and this sample is strongly biased towards major European languages. We are working to build a new pathway for much-needed research to expand the small child language evidential base. Our goal is to facilitate the creation of typologically diverse initial ‘sketch’ descriptions of language acquisition. Many CoEDL researchers are ideally placed to contribute to our understanding of language development in lesser-studied languages. If you’re a documentary linguist or a developmental linguist or you’re simply interested in language description then you may have considered collecting child language data but don’t know how to begin and what to do with it. This master class will guide you in these endeavors.
This course will provide a model for developing an initial sketch description of how children learn a lesser-studied language based on 5 hours of data that could be collected across one fieldtrip. The class assumes no background knowledge. We begin with an introduction to working with children, laying out the steps for data collection, processing, and corpus construction. We then take participants through the features of the sketch grammar model with reference to new sketch grammars developed from the general model. Participants will come away with a clear pathway for developing a new research area of child language documentation in a less-studied language.
- Computational methods for linguistic typology
Instructor: Matt Carroll
This masterclass will show how computational methods can be used to amplify the efforts of fine-grained linguistic typology to answer core CoEDL questions of how and why languages differ.
The course will take us through the typological data cycle and show how we can use computational methods to collect, manage, analyse and share typological data. We will see how current methods from data science allow us to increase the quantity and quality of our typological data as well as make it available to other researchers in online repositories. We will see how to build typologies using modern typological frameworks and how to develop these into explicit linguistic models in order to classify our data with precision, speed and detail.
The course will focus on the typology of morphological exponence but will be generalisable to any linguistic domain.
Saturday, 11 December 2021: Morning, 9.00 - 12.30
- Evolutionary modelling of population-level language variation and change
Instructors: Felicity Meakins, Lindell Bromham, Xia Hua
This masterclass will highlight interdisciplinary work in CoEDL that links linguists to evolutionary biologists, developing new ways of investigating the patterns and processes of language change. Lindell will start with a general introduction to the intersections between biological evolution and language change, highlighting both similarities and differences, and giving an overview of the many ways in which techniques developed in evolutionary biology can be adapted to asking questions in linguistics (hint: it’s not about making phylogenies!). Felicity will describe field methods and data collection strategies that can produce powerful datasets suited to statistical analysis and can be used to test long-standing ideas in linguistics such as processes of simplification and complexification, linguistic coherence, and influences on social and generational factors on patterns of language change over time. Xia will then give an overview of methods she has developed, adapting evolutionary models to be applicable to language data collected from speaker populations, explaining the aims and the basic approach (note that this will be a general demonstration of the methods, not hands-on participation).
- Phrase-based language learning resource creation and documentation with the Listen N Talk app
Instructors: Mark Richards, Caroline Jones
Many linguistic resources are a detailed ‘disassembly’ of language, word by word and morpheme by morpheme. How can documentation also cater to learners who want to communicate in full phrases? This workshop will feature a hands-on introduction to the new Listen N Talk app shell created in CoEDL and freely available. The app allows community members and/or linguists to repurpose archival or newly made recordings of significant phrases that can be used for language learning. As this is a practical workshop there is a limit of 20 participants.
Saturday, 11 December 2021: Afternoon, 1.30 - 5.00
- Documenting multilingual contexts: working with communities to analyse multilingual practices and language ideologies
Instructors: Ruth Singer, Jill Vaughan
This workshop explores approaches to researching multilingualism, with a particular focus on communities where many small languages are still spoken. These contexts of small-scale multilingualism challenge established ideas about how languages coexist and change. CoEDL scholars have been central to the emergence of this new field of research, which has implications for how we understand the evolution of contemporary linguistic diversity. Small-scale multilingualism will be introduced through an account of language use in Warruwi and Maningrida, two coastal Arnhem land communities where the presenters work. The workshop will also consider some of the key methods: community collaboration, linguistic biography, ethnography and historical research. Participants will be able to try out some of the methods, by reflecting on their own experiences of multilingualism. This course is aimed at people who do research with multilingual communities, or are interested in doing so, but others are also very welcome to participate.
- Transcription acceleration with Elpis
Instructors: Ben Foley, Nick Lambourne, Matt Low
Do you have a love/hate relationship with transcription? In recent years, advances in speech recognition technologies have inspired people to use this tech to accelerate their language transcription work. However, access to the technology has generally been difficult due to the specific skills required. This course will explore the use of the user-friendly Elpis speech recognition system to make initial approximate transcripts as a basis for further editing. Elpis can be trained using small quantities of transcribed language recordings, and doesn't require a degree in computer science to use. The course will cover how to prepare data for Elpis, and how to train Elpis to recognise participants' own language data. Different approaches will be explored, including the use of pre-trained systems that use languages with large quantities of recordings, adapting them to target languages with smaller amounts of recordings. Elpis is being developed by CoEDL in collaboration with linguists and language workers nationally and internationally.