Data credibility when populating ontologies and knowledge graphs

Бесплатный доступ

The problem of assessing trust in the information extracted from textual sources to populate ontologies or knowledge graphs is considered. For a unit of information or a fact, the minimum knowledge about an instance of the subject area, expressed by a single RDF triplet, is taken. The paper provides a description of a probabilistic trust evaluation model based on Markov random processes. When assessing, the model is built on the basis of available information about sources, taking into account previously extracted data. A method for assessing the credibility of information with parallel weighting of sources is also provided. The proposed approach is in demand when the quality of the data sources is unknown or unavailable. As part of testing the model, sets of numerical data of various sizes were automatically generated, experiments were carried out to weigh the sources and assess trust in the information extracted from them. It was shown that in most cases the weights of the sources calculated on the basis of the proposed model are the greater, the smaller the average deviation of the information they provide from the true one, and the confidence in facts increases with decreasing distance to the true data. Comparison with data aggregation models is made. In most cases, the aggregation based on the trust score showed the smallest average deviation from the true data among the considered models. The obtained results show that the proposed model is effective in comparison with other similar models and can be used in problems of assessing trust in facts represented by real numbers.

Еще

Ontology, knowledge graph, data extraction, information trustworthiness, markov process

Короткий адрес: https://sciup.org/170198105

IDR: 170198105   |   DOI: 10.18287/2223-9537-2023-13-1-113-124

Статья научная