0

Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech

A novel language identification system using conformer layers and attentive temporal pooling achieves high accuracy with domain adaptation techniques and outperforms LSTM and transformer models.

Year
2022
Venue
arXiv 2022
Authors
5
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2202.12163v4ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

In this paper, we introduce a novel language identification system based on conformer layers. We propose an attentive temporal pooling mechanism to allow the model to carry information in long-form audio via a recurrent form, such that the inference can be performed in a streaming fashion. Additionally, we investigate two domain adaptation approaches to allow adapting an existing language identification model without retraining the model parameters for a new domain. We perform a comparative study of different model topologies under different constraints of model size, and find that conformer-based models significantly outperform LSTM and transformer based models. Our experiments also show that attentive temporal pooling and domain adaptation improve model accuracy.

Authors

5