Singapore Institute of Technology
Browse

META REPRESENTATION LEARNING METHOD FOR ROBUST SPEAKER VERIFICATION IN UNSEEN DOMAINS

Download (1.25 MB)
conference contribution
posted on 2025-03-21, 08:37 authored by Jian-Tao Zhang, Yan Song, Jin Li, Wu Guo, Hao-Yu Song, Ian McLoughlinIan McLoughlin

This paper presents a meta representation learning method for robust speaker verification (SV) in unseen domains. It is known that the existing embedding learning based SV systems may suffer from domain mismatch issues. To address this, we propose an episodic training procedure to compensate domain mismatch conditions at runtime. Specifically, episodes are constructed with domain balanced episodic sampling from two different domains, and a new domain alignment (DA) module is added besides the feature extractor (FE) and classifier to existing network structures. In each episodic training iteration, FE and DA modules are optimized separately with different objectives to improve the robustness of learning. Besides, a cross-domain inter-class alignment (CDICA) loss is proposed for improving the domain generalization ability. Experimental results on CNCeleb and VoxCeleb benchmarks demonstrate significant performance gains for unseen domains in SV.

History

Journal/Conference/Book title

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Publication date

2024-04-14

Version

  • Post-print

Rights statement

© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC