Singapore Institute of Technology
Browse

LLM-Enhanced Spoken Named Entity Recognition leveraging ASR N-best Hypotheses

This item contains files with download restrictions
conference contribution
posted on 2025-07-04, 01:13 authored by Farhan Azmi, Rong TongRong Tong

Identifying Personally Identifiable Information (PII) from spoken documents is crucial for privacy preservation in speech processing. Unlike written text, spoken language exhibits greater variability due to factors such as accent, emotion, hesitation, and vocabulary choice, which can complicate PII detection. A standard approach involves using Automatic Speech Recognition (ASR) followed by Named Entity Recognition (NER) to identify PII from speech input. However, the accuracy of ASR is pivotal for effective PII discovery, and the inherent complexities of speech production can lead to ASR errors, hindering PII detection. To address this limitation, we propose a novel method that integrates an LLM-based module after ASR to perform error correction and PII tagging, leveraging the richer contextual information available in the n-best outputs from the ASR system. We systematically investigate various prompting strategies, including Zero-shot, Few-shot, and Chain-of-Thought prompting, to guide the LLM. Our experimental results demonstrate that the LLM-based error correction yields a substantial F1 improvement on PII tagging. Furthermore, incorporating the n-best list consistently improves the F1 score, and Chain-of-Thought prompting outperforms other strategies like Zero-shot and Few-shot prompting.

History

Journal/Conference/Book title

International Conference on Asian Language Processing (IALP) 2025

Publication date

2025-08

Version

  • Pre-print

Corresponding author

Rong Tong

Project ID

  • 15875 (R-R12-A405-0009) Automatic speech de-identification on Singapore English speech

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC