iText2KG: A Zero-Shot Method for Building Knowledge Graphs from Unstructured Text

Constructing knowledge graphs (KGs) from unstructured data is a complex task due to the difficulties of extracting and structuring meaningful information from raw text. Researchers from INSA Lyon, CNRS, and Universite Claude Bernard Lyon 1 introduce iText2KG, a zero-shot, topic-independent method for incrementally constructing KGs from unstructured data without the need for predefined ontologies or post-processing.

iText2KG consists of four distinct modules:

Document Distiller: Reforms raw documents into semantic blocks using large language models guided by a flexible, user-defined schema.
Incremental Entity Extractor: Extracts unique entities from the semantic blocks, ensuring no duplications or semantic ambiguities.
Incremental Relation Extractor: Identifies and extracts semantically unique relationships between entities.
Graph Integrator: Visualizes the entities and relationships in a KG using Neo4j, allowing for structured representation of data.

This modular design separates entity and relation extraction tasks, leading to improved precision and consistency. Moreover, the use of a zero-shot learning paradigm ensures adaptability across various domains without the need for fine-tuning or retraining, making it a flexible, accurate, and scalable solution for KG construction.

iText2KG processes documents incrementally by passing them through its four core modules. First, the Document Distiller module restructures raw text into semantic blocks based on a flexible, user-defined schema, which can be adapted to different types of documents such as scientific papers, CVs, or websites. These semantic blocks are then fed into the Incremental Entity Extractor, which identifies and ensures that each entity is unique by resolving potential ambiguities using similarity measures like cosine similarity.

The Incremental Relation Extractor then extracts relationships between the identified entities, leveraging both local and global document contexts to ensure the accuracy of the relationships. Finally, the Graph Integrator consolidates these entities and relationships into a visual knowledge graph using Neo4j, providing a coherent and structured representation of the data.

The system’s performance was tested on a variety of document types, demonstrating its versatility across different use cases without the need for retraining. iText2KG exhibited superior performance compared to baseline methods, particularly in schema consistency, triplet extraction precision, and entity/relation resolution. The system achieved high consistency in structuring information from various types of documents, such as scientific articles, websites, and CVs. Precision in extracting relevant relationships was notably high when using local entities, ensuring minimal errors in the knowledge graph.

Overall, iText2KG proved to be effective in constructing accurate and consistent knowledge graphs across multiple domains, adapting to different data types without the need for extensive fine-tuning or post-processing.
Source: https://www.marktechpost.com/2024/09/12/how-can-we-convert-unstructured-text-into-actionable-knowledge-this-ai-paper-unveils-itext2kg-for-incremental-knowledge-graphs-construction-using-large-language-models/