Singapore Institute of Technology

File(s) not publicly available

Domain Robust Deep Embedding Learning for Speaker Recognition

conference contribution
posted on 2023-10-01, 00:58 authored by Hang-Rui Hu, Yan Song, Ying Liu, Li-Rong Dai, Ian McLoughlinIan McLoughlin, Lin Liu

This paper presents a domain robust deep embedding learning method for speaker verification (SV) tasks. Most recent methods utilize deep neural networks (DNN) to learn compact and discriminative speaker embeddings from large-scale labeled datasets such as VoxCeleb and the NIST SRE corpus. Despite the success of exiting methods, performance may degrade significantly for new target datasets, mainly due to the distribution discrepancy between training and test domains. Moreover, how corpora are collected, and the languages they contain differ, leading to them spanning multiple, perhaps mismatched, latent domains. To address this, a multi-task end-to-end framework is proposed to learn speaker embeddings from both labeled source and unlabeled target datasets. Motivated by label smoothing, a smoothed knowledge distillation (SKD) based self-supervised learning method is designed to exploit latent structural information from the unlabeled target domain. Furthermore, a domain-aware batch normalization (DABN) module aims to reduce the cross-domain distribution discrepancy, while a domain-agnostic instance normalization (DAIN) module aims to learn features that are robust to within-domain variance. Evaluation on NIST SRE16 demonstrates significant performance gains.


Journal/Conference/Book title

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Publication date


Usage metrics


    No categories selected


    Ref. manager