File(s) not publicly available
Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift
In this paper, a self-supervised representation learning method is proposed for anomalous sound detection (ASD). ASD has received much research attention in recent DCASE challenges. It aims to identify whether a sound emitted from a machine is anomalous or not, given only normal sound data. This is a challenging task due to highly variable time-frequency characteristics of sounds from different machine types, and the fact that many attributes affect machine without being anomalous. This is especially true for domain shift tasks, where only a few training sound clips are available. From the perspective of self-supervised learning, each given sound clip can be considered as a transformation of an original clean sound, where the attribute of each clip may indicate different supervision signals. We propose a unified representation learning framework, equipped with a time-frequency attention mechanism, to perform ASD for different machine types and attributes. For domain shift, a centre imprinting method, which directly sets centres for target domain attributes, is presented. This provides immediate good representation and an initialization for further fine-tuning. Evaluation on DCASE2021 ASD task demonstrates the effectiveness of the proposed method.