Leveraging Large Language Models for Speech De-Identification
This paper presents a novel approach to address the scarcity of labeled data in speech de-identification, a critical task for protecting personal privacy. By leveraging a large language model, we propose a fully automated data augmentation strategy that generates synthetic speech text data enriched with diverse personally identifiable information (PII) entities. This augmented dataset is then used to train the speech-deidentifcation models, significantly improving its performance on spoken language. To further enhance de-identification accuracy, we explore both pipeline and end-to-end models. While the pipeline approach sequentially applies speech recognition and NER, the end-to-end model jointly learns these tasks. Our experimental results demonstrate the effectiveness of our data augmentation strategy and the superiority of the end-to-end model in improving PII detection accuracy and robustness.
History
Journal/Conference/Book title
International Journal of Asian Language Processing (IJALP)Publication date
2025-02Version
- Published
Corresponding author
Rong TongProject ID
- 15875 (R-R12-A405-0009) Automatic speech de-identification on Singapore English speech