Singapore Institute of Technology
zheng21_interspeech.pdf (259.87 kB)

An effective mutual mean teaching based domain adaptation method for sound event detection

Download (259.87 kB)
conference contribution
posted on 2024-04-03, 02:52 authored by Xu Zheng, Yan Song, Li-Rong Dai, Ian McLoughlinIan McLoughlin, Lin Liu

In this paper, we present a novel mutual mean teaching based domain adaptation (MMT-DA) method for sound event detection (SED) task, which can effectively exploit synthetic data to improve the SED performance. Existing methods simply treat the synthetic data as strongly-labeled data in semi-supervised learning (SSL) framework. Benefiting from the strong labels of synthetic data, superior SED performance can be achieved. However, a distribution mismatch between synthetic and real data raises an evident challenge for domain adaptation (DA). In MMT-DA, convolutional recurrent neural networks (CRNN) learned from different datasets (i.e. total data:real+synthetic, and real data) are exploited for DA. Specifically, mean teacher method using CRNN is employed for utilizing the unlabeled real data. To compensate the domain diversity, an additional domain classifier with gradient reverse layer(GRL) is used for training a mean teacher for total data. The student CRNNs are mutually taught using the soft predictions of unlabeled data obtained from different teachers. Furthermore, a strip pooling based attention module is exploited to model the inter-dependencies between channels and time-frequency dimensions to exploit the structure information. Experimental results on Task4 of DCASE2020 demonstrate the ability of the proposed method, achieving 52.0% F1-score on the validation dataset, which outperforms the winning system’s 50.6%.


Journal/Conference/Book title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 30 August – 3 September, 2021, Brno, Czechia.

Publication date



  • Published

Rights statement

Zheng, X., Song, Y., Dai, L.-R., McLoughlin, I., Liu, L. (2021) An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection. Proc. Interspeech 2021, 556-560, doi: 10.21437/Interspeech.2021-281.

Usage metrics



    Ref. manager