Singapore Institute of Technology

File(s) not publicly available

Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection

conference contribution
posted on 2024-04-03, 06:50 authored by Jie Yan, Yan Song, Li-Rong Dai, Ian McLoughlinIan McLoughlin

Weakly labeled semi-supervised learning methods have recently drawn increasing attention from the research community for sound event detection tasks. Due to the weakness of the labelling, neural networks are often designed to perform sound event detection (SED) and audio tagging (AT) at the same time. In this paper, we propose a task-aware mean teacher method using a convolutional recurrent neural network (CRNN) with multi-branch structure to solve the SED and AT tasks differently. Specifically, a branch with coarse-level temporal resolution is designed for the AT task, while a branch with fine-level temporal resolution is designed for the SED task. The mean teacher based semi-supervised learning method is first adopted to improve the performance of the coarse-level AT branch by exploiting unlabeled data. Then the coarse-level AT branch is introduced as a teacher to guide the aggregated AT output of the fine-level SED branch, yielding an improvement in the SED performance. To further improve the AT and SED performance, information from multiple layers is exploited in the form of a multi-resolution feature. Experimental results on Task4 of the DCASE2018 challenge demonstrate the superiority of the proposed method, achieving 37.7% F1-score, which outperforms the winning system's 32.4%.


Journal/Conference/Book title

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04-08 May 2020, Barcelona, Spain.

Publication date


Usage metrics



    Ref. manager