Singapore Institute of Technology
Browse

Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction

Download (1.54 MB)
conference contribution
posted on 2025-01-10, 06:45 authored by Qi Sun, Kun Huang, Xiaocui Yang, Rong TongRong Tong, Kun Zhang, Soujanya Poria

Document-level Relation Triplet Extraction (DocRTE) is a fundamental task in information systems that aims to simultaneously extract entities with semantic relations from a document. Existing methods heavily rely on a substantial amount of fully labeled data. However, collecting and annotating data for newly emerging relations is time-consuming and labor-intensive. Recent advanced Large Language Models (LLMs), such as ChatGPT and LLaMA, exhibit impressive long-text generation capabilities, inspiring us to explore an alternative approach for obtaining auto-labeled documents with new relations.
In this paper, we propose a Zero-shot Document-level Relation Triplet Extraction (ZeroDocRTE) framework, which Generates labeled data by Retrieval and Denoising Knowledge from LLMs, called GenRDK. Specifically, we propose a chain-of-retrieval prompt to guide ChatGPT to generate labeled long-text data step by step. To improve the quality of synthetic data, we propose a denoising strategy based on the consistency of cross-document knowledge. Leveraging our denoised synthetic data, we proceed to fine-tune the LLaMA2-13B-Chat for extracting document-level relation triplets. We perform experiments for both zero-shot document-level relation and triplet extraction on two public datasets. The experimental results illustrate that our GenRDK framework outperforms strong baselines.

Funding

R-R12-A405-0009 acrf tier 1

History

Journal/Conference/Book title

WWW '24: The ACM Web Conference 2024, Singapore, 13-17 May 2024.

Publication date

2024-05-13

Version

  • Published

Corresponding author

zhangkun@njust.edu.cn

Project ID

  • 15875 Automatic speech de-identification on Singapore English speech