Seminar: Separating speaker and co-articulation effects via two-way analysis of sub-band cepstral variances in the Japanese vowels of 300 male speakers, Yuko Kinoshita, 25 Oct
Seminar: Separating speaker and co-articulation effects via two-way analysis of sub-band cepstral variances in the Japanese vowels of 300 male speakers
Speaker: Yuko Kinoshita
When: 25 Oct 2019, 3.30pm-5pm
Where: Engma Room (3.165), H C Coombs Building, ANU
The goal of achieving robust forensic voice identification is hampered by a number of complex and intertwined factors of variability in the speech signal such as: (1) speaker differences; (2) co-articulation effects; (3) channel conditions; (4) elicitation styles. Here we propose to decompose the relative effects of factors (1) and (2) using a 300-speaker database of the 5 Japanese vowels extracted from monosyllabic utterances with 10 phonetic contexts varying only in their preceding consonants. The utterances were produced 4 times in citation style under one channel condition (microphone with 8-KHz bandwidth), and linear-prediction cepstra (order 14) were obtained from the centre frames of the vowel nuclei. Two-way analyses of variance (ANOVA) were carried out to separate speaker and consonantal effects, incorporating a parametric distance formulation that affords selection of any sub-band directly from full-band cepstra. The ANOVA results based on 8 sub-bands (500-Hz width) spanning the full band of 4 kHz, provide further evidence that front vowels contain more speaker-specific information than back vowels and that, irrespective of the vowel, the sub-bands encompassing the higher formant-frequencies [2-3 kHz] are more useful for capturing speaker differences. The results also indicate that, in those higher sub-bands, the contributions of co-articulation effects to the total variability are happily minimal. These results however raise future questions regarding the interactions between vowels and places of articulation in the preceding consonants and, not least, the forensically-relevant heterogeneity in the speaker population used.