NWU Institutional Repository

Language identification of individual words with joint sequence models

Loading...
Thumbnail Image

Date

Authors

Giwa, Oluwapelumi
Davel, Marelie H.

Researcher ID

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Interspeech 2014

Record Identifier

Abstract

Within a multilingual automatic speech recognition (ASR) system, knowledge of the language of origin of unknown words can improve pronunciation modelling accuracy. This is of particular importance for ASR systems required to deal with codeswitched speech or proper names of foreign origin. For words that occur in the language model, but do not occur in the pronunciation lexicon, text-based language identification (T-LID) of a single word in isolation may be required. This is a challenging task, especially for short words. We motivate for the importance of accurate T-LID in speech processing systems and introduce a novel way of applying Joint Sequence Models to the T-LID task. We obtain competitive results on a real-world 4- language task: for our best JSM system, an F-measure of 97:2% is obtained, compared to a F-measure of 95:2% obtained with a state-of-the-art Support Vector Machine (SVM).

Sustainable Development Goals

Description

Citation

Oluwapelumi Giwa and Marelie H. Davel, “Language identification of individual words with joint sequence models”, in Proc. Interspeech, pp 1400-1404, Singapore, 2014. [http://engineering.nwu.ac.za/multilingual-speech-technologies-must/publications]

Endorsement

Review

Supplemented By

Referenced By