This paper addresses the challenge of data scarcity in speech de-identification by introducing a novel, fully automated data augmentation method leveraging large language models. Our approach overcomes the limitations of human annotation, enabling the creation of extensive training datasets. To enhance de-identification performance, we compare pipeline and end-to-end models. While the pipeline approach sequentially applies speech recognition and named entity recognition, the end-to-end model jointly learns these tasks. Experimental results demonstrate the effectiveness of our data augmentation strategy and the superiority of the end-to-end model in improving PII detection accuracy and robustness.
History
Journal/Conference/Book title
International Conference on Advanced Informatics: Concepts, Theory and Applications, 2024
Publication date
2024-09-28
Version
Pre-print
Rights statement
This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
Corresponding author
Rong Tong
Project ID
15875 Automatic speech de-identification on Singapore English speech