Singapore Institute of Technology
Browse
zhang20da_interspeech.pdf (366 kB)

Semi-supervised end-to-end ASR via teacher-student learning with conditional posterior distribution

Download (366 kB)
conference contribution
posted on 2024-04-03, 05:59 authored by Zi-Qiang Zhang, Yan Song, Jian-shu Zhang, Ian McLoughlinIan McLoughlin, Li-Rong Dai

Encoder-decoder based methods have become popular for automatic speech recognition (ASR), thanks to their simplified processing stages and low reliance on prior knowledge. However, large amounts of acoustic data with paired transcriptions is generally required to train an effective encoder-decoder model, which is expensive, time-consuming to be collected and not always readily available. However unpaired speech data is abundant, hence several semi-supervised learning methods, such as teacher-student (T/S) learning and pseudo-labeling, have recently been proposed to utilize this potentially valuable resource. In this paper, a novel T/S learning with conditional posterior distribution for encoder-decoder based ASR is proposed. Specifically, the 1-best hypotheses and the conditional posterior distribution from the teacher are exploited to provide more effective supervision. Combined with model perturbation techniques, the proposed method reduces WER by 19.2% relatively on the LibriSpeech benchmark, compared with a system trained using only paired data. This outperforms previous reported 1-best hypothesis results on the same task.

History

Journal/Conference/Book title

Annual Conference of the International Speech Communication Association, INTERSPEECH 2020, October 25–29, 2020, Shanghai, China.

Publication date

2020-10-25

Version

  • Published

Rights statement

Zhang, Z.-q., Song, Y., Zhang, J.-s., McLoughlin, I., Dai, L.-R. (2020) Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution. Proc. Interspeech 2020, 3580-3584, doi: 10.21437/Interspeech.2020-1574.

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC