Semi-Supervised Training for Lecture Transcription in Resource-Scarce Environments
Loading...
Date
Authors
De Villiers, Pieter
Barnard, Etienne
van Heerden, Charl J.
Jooste, Petri
Researcher ID
Supervisors
Journal Title
Journal ISSN
Volume Title
Publisher
Pattern Recognition Association of South Africa and Mechatronics International Conference
Record Identifier
Abstract
We present a study where standard semi-supervised
training methods are applied in a resource-scarce environment
to build lecture transcription systems. Experiments are
conducted on two different corpora which one can expect to
be available in resource-scarce environments. These include 1)
speaker- and domain-specific data where a single South African
English lecturer presents the “Operating Systems” course, and
2) Afrikaans speaker-independent and domain non-specific data
collected from science and law courses. Different amounts of
acoustic and language model data are used for training the
respective models. We find that lecture transcription systems
in resource-scarce environments can benefit substantially from
semi-supervised training methods. We also describe a small, new
corpus of spoken lectures which is freely available in the public
domain.
Sustainable Development Goals
Description
Citation
Pieter De Villiers, Etienne Barnard, Charl Van Heerden and Petri Jooste, “Semi-Supervised Training for Lecture Transcription in Resource-Scarce Environments”, in Proc. Annual Symp. Pattern Recognition Association of South Africa (PRASA), pp 7-12, Cape Town, South Africa, 2014. [http://engineering.nwu.ac.za/multilingual-speech-technologies-must/publications]
