File(s) not publicly available
Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection
Weakly labeled semi-supervised learning methods have recently drawn increasing attention from the research community for sound event detection tasks. Due to the weakness of the labelling, neural networks are often designed to perform sound event detection (SED) and audio tagging (AT) at the same time. In this paper, we propose a task-aware mean teacher method using a convolutional recurrent neural network (CRNN) with multi-branch structure to solve the SED and AT tasks differently. Specifically, a branch with coarse-level temporal resolution is designed for the AT task, while a branch with fine-level temporal resolution is designed for the SED task. The mean teacher based semi-supervised learning method is first adopted to improve the performance of the coarse-level AT branch by exploiting unlabeled data. Then the coarse-level AT branch is introduced as a teacher to guide the aggregated AT output of the fine-level SED branch, yielding an improvement in the SED performance. To further improve the AT and SED performance, information from multiple layers is exploited in the form of a multi-resolution feature. Experimental results on Task4 of the DCASE2018 challenge demonstrate the superiority of the proposed method, achieving 37.7% F1-score, which outperforms the winning system's 32.4%.