Back to listing

Archiving achievements – PARADISEC hits 10,000 hours!

Archiving, Nick Thieberger

Date: 17 April 2019

This month, the Pacific and Regional Archive for Digital Sources in Endangered Cultures reached the major milestone of storing 10,000 hours of audio recordings, making up over 57 terabytes of data in 1,205 distinct languages. 

PARADISEC is the main repository for Centre materials, and 2018 proved to be a huge year as it added another 72 languages and an incredible 80,000 files – now totalling 280,000. To top it all off, it won the University of Melbourne’s Award for Excellence in Team-Based Research Programs.

Reflecting on the 16-year journey to build one of the world’s largest ethnographic data repositories, PARADISEC’s Director and CoEDL Chief Investigator Nick Thieberger said it has been a race to save materials with great scientific and cultural value.

“We have collected recordings dating back to the 1950s, some from the personal collections of academics which had been gathering dust in boxes in backyard sheds, garages and attics,” says Nick. “Once we digitise it, that changes everything. Suddenly you have something that’s available everywhere you can get access to the internet, even on your phone.”

Nick believes digitisation and online access represents a quantum shift in the relationship between the researcher and the research they create, not only because of crucial open access to primary data, but because of the significant cultural heritage value to the communities which provided it.

“When the people in those communities go on the web to find reflections of their own societies, they typically don’t find them – there’s nothing in their language,” Nick says. “So it has this other wonderful virtue of making recordings available for those people and their descendants.”

Among PARADISEC's activities for the International Year of Indigenous Languages is the 'Mystery Language of the Week', which features a new audio sample each week. The team is asking anyone to help identify and describe the mystery languages and seven samples are already available – all suggestions, comments or leads welcome!

Meanwhile, our Corpus Manager Wolfgang Barth has also been busy with a new project to improve access to the Centre’s text-based corpora. A portal website under construction will put a public face to the languages currently being worked on by Centre researchers. Each of about 30 languages will have a page containing basic information, audio of some stories with captions, links to resources like articles or books, and link to its full corpus in the ANNIS system.

“It’s going to be a jumping in point, or jumping off point if you like, to the ever-increasing number of corpora we are building from languages of our region,” Nick says. “For Bislama, the creole language of Vanuatu, the corpus has just passed one million words. Through our project with the University of the South Pacific, Vanuatu’s national language has become the first in Melanesia to have a major body of written documents of this size.”


Did you know PARADISEC is a registered charity and tax deductible gift recipient? To find out how you could support this important work into the future visit:

(Main image: Dr Amanda Harris, Associate Professor Nick Thieberger and Professor Linda Barwick accept the University of Melbourne’s Award for Excellence in Team-Based Research Programs, December 2018)

  • Australian Government
  • The University of Queensland
  • Australian National University
  • The University of Melbourne
  • Western Sydney University