Singapore Institute of Technology
Browse
zheng20_interspeech.pdf (300.67 kB)

An effective perturbation based semi-supervised learning method for sound event detection

Download (300.67 kB)
conference contribution
posted on 2024-04-03, 05:49 authored by Xu Zheng, Yan Song, Jie Yan, Li-Rong Dai, Liu, L., Ian McLoughlinIan McLoughlin, Lin Liu

Mean teacher based methods are increasingly achieving state-of-the-art performance for large-scale weakly labeled and unlabeled sound event detection (SED) tasks in recent DCASE challenges. By penalizing inconsistent predictions under different perturbations, mean teacher methods can exploit large-scale unlabeled data in a self-ensembling manner. In this paper, an effective perturbation based semi-supervised learning (SSL) method is proposed based on the mean teacher method. Specifically, a new independent component (IC) module is proposed to introduce perturbations for different convolutional layers, designed as a combination of batch normalization and dropblock operations. The proposed IC module can reduce correlation between neurons to improve performance. A global statistics pooling based attention module is further proposed to explicitly model inter-dependencies between the time-frequency domain and channels, using statistics information (e.g. mean, standard deviation, max) along different dimensions. This can provide an effective attention mechanism to adaptively re-calibrate the output feature map. Experimental results on Task 4 of the DCASE2018 challenge demonstrate the superiority of the proposed method, achieving about 39.8% F1-score, outperforming the previous winning system’s 32.4% by a significant margin.

History

Journal/Conference/Book title

Annual Conference of the International Speech Communication Association, INTERSPEECH, October 25–29, 2020, Shanghai, China.

Publication date

2020-10-25

Rights statement

Zheng, X., Song, Y., Yan, J., Dai, L.-R., McLoughlin, I., Liu, L. (2020) An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection. Proc. Interspeech 2020, 841-845, doi: 10.21437/Interspeech.2020-2329.

Usage metrics

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC