On methods and models of keyword automatic extraction

Автор: Sheremetyeva S.O., Osminin P.G.

Журнал: Вестник Южно-Уральского государственного университета. Серия: Лингвистика @vestnik-susu-linguistics

Статья в выпуске: 1 т.12, 2015 года.

Бесплатный доступ

The paper presents an overview and classification of major approaches to the automatic extraction of keywords from text documents. The approaches can be divided into statistical and hybrid approaches. Both of these types can be further classified into corpora-based and document-based. Advantages and shortcomings of particular approaches are analyzed. It is claimed that the use of statistical keyword extraction methods for inflecting languages, such as Russian, is problematic. Requirements to the efficient model of automatic keyword extraction from texts in Russian are formulated and particular recommendations to meet these requirements are given. It is emphasized that to create effective keyword extractors one should take into consideration the linguistic types of natural languages (analytical, inflecting, agglutinative, isolating), the domain (sublanguage) and the availability of linguistic and programming resources. The approach is illustrated by a case study of a keyword extractor for Russian texts on mathematical modeling.

Еще

Automatic extraction, russian

Короткий адрес: https://sciup.org/147153946

IDR: 147153946 | УДК: 81’322