An approach based on TF-IDF metrics to extract the knowledge and relevant linguistic means on subject-oriented text sets

Бесплатный доступ

In this paper we look at a problem of extracting knowledge units from the sets of subject-oriented texts. Each such text set is considered as a corpus. The main practical goal here is finding the most rational variant to express the knowledge fragment in a given natural language for further reflection in the thesaurus and ontology of a subject area. The problem is of importance when constructing systems for processing, analysis, estimation and understanding of information represented, in particular, by images. In this paper, by applying the TF-IDF metrics to classify words of the initial phrase in relation to given text corpora we address the task of selecting phrases closest to the initial one in terms of the described fragment of actual knowledge or forms of its expression in a given natural language.

Еще

Pattern recognition, intelligent data analysis, information theory, open-form test assignment, natural-language expression of expert knowledge

Короткий адрес: https://sciup.org/14059379

IDR: 14059379

Статья научная