LLM-Enhanced Spoken Named Entity Recognition leveraging ASR N-best Hypotheses
Identifying Personally Identifiable Information (PII) from spoken documents is crucial for privacy preservation in speech processing. Unlike written text, spoken language exhibits greater variability due to factors such as accent, emotion, hesitation, and vocabulary choice, which can complicate PII detection. A standard approach involves using Automatic Speech Recognition (ASR) followed by Named Entity Recognition (NER) to identify PII from speech input. However, the accuracy of ASR is pivotal for effective PII discovery, and the inherent complexities of speech production can lead to ASR errors, hindering PII detection. To address this limitation, we propose a novel method that integrates an LLM-based module after ASR to perform error correction and PII tagging, leveraging the richer contextual information available in the n-best outputs from the ASR system. We systematically investigate various prompting strategies, including Zero-shot, Few-shot, and Chain-of-Thought prompting, to guide the LLM. Our experimental results demonstrate that the LLM-based error correction yields a substantial F1 improvement on PII tagging. Furthermore, incorporating the n-best list consistently improves the F1 score, and Chain-of-Thought prompting outperforms other strategies like Zero-shot and Few-shot prompting.
History
Journal/Conference/Book title
International Conference on Asian Language Processing (IALP) 2025Publication date
2025-08Version
- Pre-print
Corresponding author
Rong TongProject ID
- 15875 (R-R12-A405-0009) Automatic speech de-identification on Singapore English speech