Semi-supervised end-to-end ASR via teacher-student learning with conditional posterior distribution

Zhang, Zi-Qiang; Song, Yan; Zhang, Jian-shu; McLoughlin, Ian; Dai, Li-Rong

doi:10.21437/Interspeech.2020-1574

Semi-supervised end-to-end ASR via teacher-student learning with conditional posterior distribution

conference contribution

posted on 2024-04-03, 05:59 authored by Zi-Qiang Zhang, Yan Song, Jian-shu Zhang, Ian McLoughlinIan McLoughlin, Li-Rong Dai

Encoder-decoder based methods have become popular for automatic speech recognition (ASR), thanks to their simplified processing stages and low reliance on prior knowledge. However, large amounts of acoustic data with paired transcriptions is generally required to train an effective encoder-decoder model, which is expensive, time-consuming to be collected and not always readily available. However unpaired speech data is abundant, hence several semi-supervised learning methods, such as teacher-student (T/S) learning and pseudo-labeling, have recently been proposed to utilize this potentially valuable resource. In this paper, a novel T/S learning with conditional posterior distribution for encoder-decoder based ASR is proposed. Specifically, the 1-best hypotheses and the conditional posterior distribution from the teacher are exploited to provide more effective supervision. Combined with model perturbation techniques, the proposed method reduces WER by 19.2% relatively on the LibriSpeech benchmark, compared with a system trained using only paired data. This outperforms previous reported 1-best hypothesis results on the same task.

History

Journal/Conference/Book title

Annual Conference of the International Speech Communication Association, INTERSPEECH 2020, October 25–29, 2020, Shanghai, China.

Publication date

2020-10-25

Version

Published

Rights statement

Zhang, Z.-q., Song, Y., Zhang, J.-s., McLoughlin, I., Dai, L.-R. (2020) Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution. Proc. Interspeech 2020, 3580-3584, doi: 10.21437/Interspeech.2020-1574.

Semi-supervised end-to-end ASR via teacher-student learning with conditional posterior distribution

History

Journal/Conference/Book title

Publication date

Version

Rights statement

Usage metrics

Categories

Keywords

Licence

Exports