Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR

Tang, Jian; Zhang, Jie; Song, Yan; McLoughlin, Ian; Dai, Li-Rong

doi:10.1109/TASLP.2021.3101921

File(s) not publicly available

Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR

journal contribution

posted on 2024-04-03, 02:56 authored by Jian Tang, Jie Zhang, Yan Song, Ian McLoughlinIan McLoughlin, Li-Rong Dai

Encoder-decoder based automatic speech recognition (ASR) methods are increasingly popular due to their simplified processing stages and low reliance on prior knowledge. Conventional encoder-decoder based approaches usually learn a sequence-to-sequence mapping function from the source speech to target units (e.g., subwords, characters) in an end-to-end manner. However, it is still unclear how to choose the optimal target unit, or granularity of multiple units. In general, as increasing the information available for learning sequence-to-sequence mapping functions can improve modeling effectiveness, we therefore propose a multi-granularity sequence alignment (MGSA) approach. This aims to enhance cross-sequence interactions between different granularity units for both modeling and inference stages in the encoder-decoder based ASR. Specifically, a decoder module is designed to generate multi-granularity sequence predictions. We then exploit the latent alignment mapping among units having different levels of granularity, by utilizing the decoded multi-level sequences as input for model prediction. The cross-sequence interaction can also be employed to re-calibrate output probabilities in the proposed post-inference algorithm. Experimental results on both WSJ-80 hrs and Switchboard-300 hrs datasets show the superiority of the proposed method compared to traditional multi-task methods as well as to single granularity baseline systems.

History

Journal/Conference/Book title

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Publication date

2021-08-06

Usage metrics

Keywords

Multi-granularity sequence alignment end-to-end ASR encoder-decoder post-inference deep learning

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) not publicly available

Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR

History

Journal/Conference/Book title

Publication date

Usage metrics

Categories

Keywords

Licence

Exports