Wednesday morning workshops

A welcome to R & RStudio for fresh users

Josh Clothier and Katie Jepson, The University of Melbourne

R is increasingly used for linguistic data analysis, particularly for those working with large corpora and wanting to perform advanced statistical techniques. With the advent of RStudio, R has never been so easy to use! In this short course, we will introduce you to the very basics of R: how to get your data in and out of R, how to install and load libraries (the tools that help you get stuff done), how to do some basic manipulations and calculations on your data, and how to troubleshoot when you come across problems. This course will be a particularly good place to start if you're unfamiliar with R but are interested in taking the course Advanced statistics for linguists: tree-based and mixed effects models in R.

Detailed course information


Transcription Acceleration for Language Documentation with ELPIS

Ben Foley and Alina Rakhi, The University of Queensland

The task of transcribing recorded audio is important in many language and speech science workflows, yet it can be very slow. A 2017 survey of 50 linguists found that it takes 40 hours to transcribe one hour of audio on average. For some languages the ratio was reported up to 230:1. Using contemporary software techniques such as automatic speech recognition (ASR) or speech-to-text, we can improve the experience of transcribing, which results in practical and psychological benefits for linguists and language workers.

During the workshop, we introduce the processes of using the Elpis speech-to-text tool (developed by CoEDL Transcription Acceleration Project) to obtain transcriptions for untranscribed audio, in the context of language documentation. The workshop covers the processes of preparing language content to get started; running the system to get transcripts for un-transcribed audio; and methods of tuning the system to improve the results. By participating in this workshop, you will develop an understanding of what can be achieved with (and the limitations of) speech recognition tools, and how to incorporate them into your existing workflows with your own data.

Detailed course information


Beyond sound: Creating and Maintaining Speech Databases with Emu

Dr Hywel Stoakes, The University of Auckland

Large linguistic databases invariably require ways to keep time aligned text in synchronisation with audio recordings. As the volumes of audio and the associated annotations balloon, it is becoming increasingly important to have a standard transferable method for metadata storage and analysis. Emu Speech Database Management System (https://ips-lmu.github.io/The-EMU-SDMS-Manual/) and its associated R package EmuR are one way of managing these records.

This workshop will go through some of the basics of creating Emu Databases, including how to make databases from scratch and how to convert existing collections from, Praat textgrid, ELAN XML and plain text. We will then look at some simple analysis and statistical techniques concentrating on linguistic questions. We will touch on ways to incorporate this into modern speech recognition workflows such as ELPIS. A basic knowledge of R and RStudio will be assumed, but is not essential in order to complete the course. A familiarity with Tidy data and the "tidyverse", as well as a working knowledge of phonetic theory will be an advantage.

Detailed course information to come

  • Australian Government
  • The University of Queensland
  • Australian National University
  • The University of Melbourne
  • Western Sydney University