Language identification of individual words with joint sequence models
Loading...
Date
Authors
Giwa, Oluwapelumi
Davel, Marelie H.
Researcher ID
Supervisors
Journal Title
Journal ISSN
Volume Title
Publisher
Interspeech 2014
Record Identifier
Abstract
Within a multilingual automatic speech recognition (ASR) system,
knowledge of the language of origin of unknown words
can improve pronunciation modelling accuracy. This is of particular
importance for ASR systems required to deal with codeswitched
speech or proper names of foreign origin. For words
that occur in the language model, but do not occur in the pronunciation
lexicon, text-based language identification (T-LID)
of a single word in isolation may be required. This is a challenging
task, especially for short words. We motivate for the
importance of accurate T-LID in speech processing systems and
introduce a novel way of applying Joint Sequence Models to the
T-LID task. We obtain competitive results on a real-world 4-
language task: for our best JSM system, an F-measure of 97:2%
is obtained, compared to a F-measure of 95:2% obtained with a
state-of-the-art Support Vector Machine (SVM).
Sustainable Development Goals
Description
Citation
Oluwapelumi Giwa and Marelie H. Davel, “Language identification of individual words with joint sequence models”, in Proc. Interspeech, pp 1400-1404, Singapore, 2014. [http://engineering.nwu.ac.za/multilingual-speech-technologies-must/publications]
