Singapore Institute of Technology

File(s) not publicly available

An Online Speaker-aware Speech Separation Approach Based on Time-domain Representation

conference contribution
posted on 2024-04-03, 06:47 authored by Hui Wang, Yan Song, Zeng-Xi Li, Ian McLoughlinIan McLoughlin, Li-Rong Dai

Despite the significant progress of deep learning based speech separation methods, it remains challenging to extract and track the speech from target speakers, especially in a single-channel multiple speaker situation. Previously, the authors proposed a source-aware context network to exploit the temporal context in mixtures and estimated sources for online speech separation. In this paper, we propose a speaker-aware approach based on the source-aware context network structure, in which the speaker information is explicitly modeled by an auxiliary speaker identification branch. Then speech separation and speaker tracking can be jointly optimized by multi-task learning. Furthermore, we study the effectiveness of time-domain representation by proposing a raw sparse waveform encoder to preserve discriminative information. Experimental results on the WSJ0-2mix benchmark show that the proposed system significantly improves Signal-to-Distortion Ratio (SDR) performance.


Journal/Conference/Book title

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04-08 May 2020, Barcelona, Spain.

Publication date


Usage metrics


    Ref. manager