This paper proposes SpeeDF, a novel three-step framework for anonymizing speech data, particularly focusing on Singaporean English (Singlish). SpeeDF tackles the challenge of protecting less-studied Personally Identifiable Information (PII) like NRIC and passport numbers, which often go overlooked by traditional de-identification methods. Unlike approaches focused solely on entity extraction, SpeeDF leverages a combination of automatic speech recognition (ASR), named entity recognition (NER), and information anonymization. This comprehensive approach ensures thorough PII redaction while preserving the naturalness and usability of the anonymized speech data for research and various downstream applications.
History
Publication date
2024-12-02
Corresponding author
Rong Tong
Project ID
15875 Automatic speech de-identification on Singapore English speech