Automatic speech segmentation with limited data
Abstract
The rapid development of corpus-based speech systems such as concatenative synthesis systems for
under-resourced languages requires an efficient, consistent and accurate solution with regard to phonetic speech segmentation. Manual development of phonetically annotated corpora is a time consuming and expensive process which suffers from challenges regarding consistency and reproducibility,
while automation of this process has only been satisfactorily demonstrated on large corpora of a select
few languages by employing techniques requiring extensive and specialised resources.
In this work we considered the problem of phonetic segmentation in the context of developing small prototypical speech synthesis corpora for new under-resourced languages. This was done
through an empirical evaluation of existing segmentation techniques on typical speech corpora in three
South African languages. In this process, the performance of these techniques were characterised under different data conditions and the efficient application of these techniques were investigated in
order to improve the accuracy of resulting phonetic alignments.
We found that the application of baseline speaker-specific Hidden Markov Models results in relatively robust and accurate alignments even under extremely limited data conditions and demonstrated
how such models can be developed and applied efficiently in this context. The result is segmentation
of sufficient quality for synthesis applications, with the quality of alignments comparable to manual
segmentation efforts in this context. Finally, possibilities for further automated refinement of phonetic alignments were investigated and an efficient corpus development strategy was proposed with
suggestions for further work in this direction.
Collections
- Engineering [1423]
Related items
Showing items related by title, author, creator and subject.
-
A smartphone-based ASR data collection tool for under-resourced languages
De Vries, Nic J.; Badenhorst, Jaco; Basson, Willem D.; De Wet, Febe; Barnard, Etienne; De Waal, Alta; Davel, Marelie H. (Elsevier, 2014)Acoustic data collection for automatic speech recognition (ASR) purposes is a particularly challenging task when working with under-resourced languages, many of which are found in the developing world. We provide a brief ... -
Hate Speech Provisions and Provisos: A Response to Marais and Pretorius and Proposals for Reform
Botha, J C; Govindjee, A (2017-11-06)This article responds to some of the issues raised by Marais and Pretorius in their 2015 article titled "A Contextual Analysis of the Hate Speech Provisions of the Equality Act" published in 2015(18)4 PER 901. In particular, ... -
Demystifying hate speech under the PEPUDA
Geldenhuys, Judith; Kelly-Louw, Michelle (PER/PELJ, 2020)The factual matrix that is considered in each hate speech case differs from that in the next. However, certain factors always remain key in the process of balancing the different constitutional rights at play: who the ...