Singapore Institute of Technology
gao21f_interspeech.pdf (540.78 kB)

Extremely low footprint end-to-end ASR system for smart device

Download (540.78 kB)
conference contribution
posted on 2024-04-03, 05:37 authored by Zhifu Gao, Yiwu Yao, Shiliang Zhang, Jun Yang, Ming Lei, Ian McLoughlinIan McLoughlin

Recently, end-to-end (E2E) speech recognition has become popular, since it can integrate the acoustic, pronunciation and language models into a single neural network, which outperforms conventional models. Among E2E approaches, attention-based models, e.g. Transformer, have emerged as being superior. Such models have opened the door to deployment of ASR on smart devices, however they still suffer from requiring a large number of model parameters. We propose an extremely low footprint E2E ASR system for smart devices, to achieve the goal of satisfying resource constraints without sacrificing recognition accuracy. We design cross-layer weight sharing to improve parameter efficiency and further exploit model compression methods including sparsification and quantization, to reduce memory storage and boost decoding efficiency. We evaluate our approaches on the public AISHELL-1 and AISHELL-2 benchmarks. On the AISHELL-2 task, the proposed method achieves more than 10× compression (model size reduces from 248 to 24MB), at the cost of only minor performance loss (CER reduces from 6.49% to 6.92%).


Journal/Conference/Book title

Annual Conference of the International Speech Communication Association, INTERSPEECH, 30 August – 3 September, 2021, Brno, Czechia.

Publication date



  • Published

Rights statement

Gao, Z., Yao, Y., Zhang, S., Yang, J., Lei, M., McLoughlin, I. (2021) Extremely Low Footprint End-to-End ASR System for Smart Device. Proc. Interspeech 2021, 4548-4552, doi: 10.21437/Interspeech.2021-819.

Usage metrics


    Ref. manager