Singapore Institute of Technology
Browse
DCASE2020Workshop_Phan_67.pdf (815.82 kB)

On multitask loss function for audio event detection and localization

Download (815.82 kB)
conference contribution
posted on 2024-04-03, 08:52 authored by Huy Phan, Lam Pham, Philipp Koch, Ngoc Q. K. Duong, Ian McLoughlinIan McLoughlin, Alfred Mertins

Audio event localization and detection (SELD) have been commonly tackled using multitask models. Such a model usually consists of a multi-label event classification branch with sigmoid cross-entropy loss for event activity detection and a regression branch with mean squared error loss for direction-of-arrival estimation. In this work, we propose a multitask regression model, in which both (multi-label) event detection and localization are formulated as regression problems and use the mean squared error loss homogeneously for model training. We show that the common combination of heterogeneous loss functions causes the network to underfit the data whereas the homogeneous mean squared error loss leads to better convergence and performance. Experiments on the development and validation sets of the DCASE 2020 SELD task demonstrate that the proposed system also outperforms the DCASE 2020 SELD baseline across all the detection and localization metrics, reducing the overall SELD error (the combined metric) by approximately 10% absolute.

History

Journal/Conference/Book title

Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE 2020), November 2-3, 2020, Tokyo, Japan.

Publication date

2020-11-02

Version

  • Published

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC