Singapore Institute of Technology
liu20o_interspeech.pdf (740.94 kB)

An effective speaker recognition method based on joint identification and verification supervisions

Download (740.94 kB)
conference contribution
posted on 2024-04-03, 06:08 authored by Ying Liu, Yan Song, Yiheng Jiang, Ian McLoughlinIan McLoughlin, Lin Liu, Li-Rong Dai

Deep embedding learning based speaker verification methods have attracted significant recent research interest due to their superior performance. Existing methods mainly focus on designing frame-level feature extraction structures, utterance-level aggregation methods and loss functions to learn discriminative speaker embeddings. The scores of verification trials are then computed using cosine distance or Probabilistic Linear Discriminative Analysis (PLDA) classifiers. This paper proposes an effective speaker recognition method which is based on joint identification and verification supervisions, inspired by multi-task learning frameworks. Specifically, a deep architecture with convolutional feature extractor, attentive pooling and two classifier branches is presented. The first, an identification branch, is trained with additive margin softmax loss (AM-Softmax) to classify the speaker identities. The second, a verification branch, trains a discriminator with binary cross entropy loss (BCE) to optimize a new triplet-based mutual information. To balance the two losses during different training stages, a ramp-up/ramp-down weighting scheme is employed. Furthermore, an attentive bilinear pooling method is proposed to improve the effectiveness of embeddings. Extensive experiments have been conducted on VoxCeleb1 to evaluate the proposed method, demonstrating results that relatively reduce the equal error rate (EER) by 22% compared to the baseline system using identification supervision only.


Journal/Conference/Book title

Annual Conference of the International Speech Communication Association, INTERSPEECH 2020, October 25–29, 2020, Shanghai, China.

Publication date



  • Published

Rights statement

Liu, Y., Song, Y., Jiang, Y., McLoughlin, I., Liu, L., Dai, L.-R. (2020) An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions. Proc. Interspeech 2020, 3007-3011, doi: 10.21437/Interspeech.2020-1922.

Usage metrics


    Ref. manager