Filenaming for PARADISEC archiving
It is understandable that you would want to use a clear, meaningful file-naming regime for your files while in the field; however, once you return and begin to organise your items to be archived, it is inevitable that these file names will change to reflect the shape of your corpus and to adhere to PARADISEC naming conventions. Here are some naming guidelines for files to be archived with PARADISEC.
Say you have, for example, a file that you named 20150908-wordlist-01.wav while in the field. This in-field naming convention includes the date in ISO format (YYYYMMDD). The (-wordlist) marks this as being from a session involving wordlist elicitation. The (-01) indicates that this is the first track of the session. You will have your own conventions.
Anatomy of a PARADISEC file name
The name 20150908-wordlist-01.wav does not fit with the PARADISEC naming conventions. First of all, there needs to be a collection ID followed by a hyphen, and then only two more parts separated by hyphens: CollectionID-ItemID-ContentFile.
- CollectionID is your PARADISEC collection name and it typically has the shape of your initials in capital letters and a number. This ID can be determined by you, but must be confirmed as available for use by the CoEDL Data Manager or a PARADISEC administrator. This ID is created when you set up a collection in PARADISEC (see: Getting Started with PARADISEC). Mine could be JCM1
- ItemID is also made up of alpha-numeric characters. You may use an underscore ( _ )if you need a separator; hyphens ( - ) are reserved PARADISEC operators, so cannot be used within a file name, excepting the two that separate the three parts of the file name. This ItemID can be used to differentiate recording sessions or events. You may have speaker initials in your item name (JB), abbreviation of the task (WRDLST), field site (BIMA).
- ContentFile is also made up of alpha-numeric characters. This part of the name allows you to enumerate files of the same format, i.e. photos with 001, 002, etc, or multiple tracks of a single audio or video session.
If you want to retain some of the original name, try something like this: JCM01-20150908_WORDLIST-01.wav
|CollectionID||ItemID||Content File||File Extension|
The file name should be in all caps if using letters, the extension should be lowercase. A file name cannot exceed 30 characters, excluding the file extension. The image below diagrams the anatomy of a PARADISEC file name that is 11 characters, including the hyphens, excluding the file extension:
Below is an example of a set of recordings collected to address multilingualism. The collection includes wordlist data, sociolinguistic interviews, and natural speech. I have chosen to use very basic item names: 001, 002, 003, knowing that the specific information of the content language and speaker details will be contained in the metadata. All information contained in the Description column will appear in the Item description field in the PARADISEC catalog.
Why correct file names are important for PARADISEC
Files are automatically sent to specific locations within our archive structure. The first part of the file name (JCM1) tells our system that these files are to be sent to the collection JCM1. Similarly, the second part of the name (001) will direct files to item 001 in our archive structure. Items will have already been created in the catalog by the depositor before sending any files to PARADISEC. The third part of the file name (F45) distinguishes it from other files under that item. Distinctive names avoid conflicts and errors for our automated system.
After reading through this guide, if you still have questions, or you wish to request a service, feel free to email me (email@example.com), or better, visit the CoEDL Service Request Form. CoEDL members use the Member login at the bottom of the CoEDL webpage. Then click the General Members tab, the link to the request form is in the left-hand panel.