Information extraction from texts based on ontology and large language models

Автор: Sidorova E.A., Ivanov A.I., Ovchinnikova K.A.

Журнал: Онтология проектирования @ontology-of-designing

Рубрика: Инжиниринг онтологий

Статья в выпуске: 1 (55) т.15, 2025 года.

Бесплатный доступ

The article examines the extraction of information from texts using the ontology of a subject area combined with neural network-based text analysis methods, including the use of large language models. It discusses the expert's role in developing and maintaining systems, illustrated through the task of extracting information from analytical articles and constructing ontologies in computational linguistics to represent key concepts relevant to the system's user or customer. The process of ontology creation is accompanied by the development of a dictionary that forms the ontology's terminological core, followed by methods for extracting new terms within the specified subject area. This task is considered as a named entity recognition problem, traditionally addressed by training a neural network model on a representative dataset. The study compares this approach with a methodology leveraging large language models. For this, lexical and syntactic patterns, as well as instruction patterns for hypothesis testing regarding new term-phrases and result verification, were developed. The developed instructions for solving the problem of relation extraction also include the automated generation of natural language competency assessment questions for each ontology relation. The novelty of the proposed approach lies in the integration of ontological, linguistic and neural network approaches to extract information from texts. The study demonstrates the possibility of solving tasks of text analysis and information extraction problems through a chain of large language models, with dynamically generated instructions based on the outcomes of prior analysis stages. The following F1-measure scores were achieved in the experiments: F1=0.8 for term extraction and classification and F1=0.87 for relation extraction.

Еще

Information extraction, domain ontology, large language model, neural network models, prompt engineering

Короткий адрес: https://sciup.org/170208812

IDR: 170208812   |   DOI: 10.18287/2223-9537-2025-15-1-114-129

Статья научная