Fine-tuning Audio Spectrogram Transformer with Task-aware Adapters for Sound Event Detection

Li, Kang; Song, Yan; McLoughlin, Ian; Liu, Lin; Li, Jin; Dai, Li-Rong

doi:10.21437/Interspeech.2023-1174

Fine-tuning Audio Spectrogram Transformer with Task-aware Adapters for Sound Event Detection

conference contribution

posted on 2023-10-01, 00:58 authored by Kang Li, Yan Song, Ian McLoughlinIan McLoughlin, Lin Liu, Jin Li, Li-Rong Dai

In this paper, we present a task-aware fine-tuning method to transfer Patchout faSt Spectrogram Transformer (PaSST) model to sound event detection (SED) task. Pretrained PaSST has shown significant performance on audio tagging (AT) and SED tasks, but it is not optimal to fine-tune the model from a single layer as the local and semantic information have not been well exploited. To address this, we first introduce task-aware adapters including SED-adapter and AT-adapter to fine-tune PaSST for SED and AT task respectively, and then propose task-aware fine-tuning to combine local information from shallower layer with semantic information from deeper layer, based on task-aware adapters. Besides, we propose the self-distillated mean teacher (SdMT) to train a robust student model with soft pseudo labels from teacher. Experiments are conducted on DCASE2022 task4 development set, the EB-F1 of 64.85% and PSDS1 of 0.5548 are achieved which outperform previous state-of-the-art systems.

History

Journal/Conference/Book title

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 20-24 August 2023, Dublin, Ireland

Publication date

2023-08-21

Usage metrics

Keywords

sound event detection transformer task-aware fine-tune mean teacher

Licence

In Copyright

Fine-tuning Audio Spectrogram Transformer with Task-aware Adapters for Sound Event Detection

History

Journal/Conference/Book title

Publication date

Usage metrics

Categories

Keywords

Licence

Exports