Spoken language identification system adaptation in under-resourced environments
Abstract
Speech technologies have matured over the past few decades and have made significant impacts in a variety of fields, from assistive technologies to personal assistants. However, speech system development is a resource intensive activity and requires language resources such as text annotated audio recordings and pronunciation dictionaries. Unfortunately, many languages found in the developing world fall into the resource-scarce category and due to this resource scarcity the deployment of Automatic Speech Recognition (ASR) systems in the developing world is severely
inhibited. Given that few task-specific corpora exist and speech technology systems perform poorly when deployed in a new environment, we investigate the use of acoustic model adaptation. We propose a new blind deconvolution technique which rapidly adapts acoustic models to a new environment and increases their overall robustness. This new technique is utilized in a Spoken Language Identification (SLID) system and significantly improves the system’s accuracy by 6% relative to the baseline system and achieves comparable performances when compared to relatively more computationally intensive standard adaptation techniques.