Singapore Institute of Technology
Browse

DP-MAE: A DUAL-PATH MASKED AUTOENCODER BASED SELF-SUPERVISED LEARNING METHOD FOR ANOMALOUS SOUND DETECTION

conference contribution
posted on 2025-03-21, 08:51 authored by Zhuo-Li Liu, Yan Song, Xiao‐Min Zeng, Li-Rong Dai, Ian McLoughlinIan McLoughlin

In this paper, we present a novel general-purpose audio representation learning method named Dual-Path Masked AutoEncoder (DPMAE) for anomalous sound detection (ASD) task. Existing methods mainly focus on frame-level generative methods or clip-level discriminative methods, which generally ignore the local information where anomalies are usually found more easily. Moreover, they apply multiple systems on one ASD task, which is lacking in generalizability. For tracking this, our method extracts patch-level features to learn unified audio representation that generalizes well and models local information that is beneficial to detecting anomalies under domain shifts by self-supervised representation learning and it further optimizes the informativeness of clip-level representations in finetuning. Concretely, the input spectrograms are randomly split into two patch-level subsets, and then they are fed into DP-MAE to predict each other. Meanwhile, the output of one path is also considered to be the predicted objective of the other path to perform regularization from the perspective of self-distillation. In fine-tuning stage, a linear classifier is applied on the features produced by the encoder to get a more compact representation of normal sound. Experiments on DCASE 2022 Challenge Task2 development dataset show the effectiveness of our method.

History

Journal/Conference/Book title

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Publication date

2024-04-14

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC