An effective perturbation based semi-supervised learning method for sound event detection
Mean teacher based methods are increasingly achieving state-of-the-art performance for large-scale weakly labeled and unlabeled sound event detection (SED) tasks in recent DCASE challenges. By penalizing inconsistent predictions under different perturbations, mean teacher methods can exploit large-scale unlabeled data in a self-ensembling manner. In this paper, an effective perturbation based semi-supervised learning (SSL) method is proposed based on the mean teacher method. Specifically, a new independent component (IC) module is proposed to introduce perturbations for different convolutional layers, designed as a combination of batch normalization and dropblock operations. The proposed IC module can reduce correlation between neurons to improve performance. A global statistics pooling based attention module is further proposed to explicitly model inter-dependencies between the time-frequency domain and channels, using statistics information (e.g. mean, standard deviation, max) along different dimensions. This can provide an effective attention mechanism to adaptively re-calibrate the output feature map. Experimental results on Task 4 of the DCASE2018 challenge demonstrate the superiority of the proposed method, achieving about 39.8% F1-score, outperforming the previous winning system’s 32.4% by a significant margin.