Efficiency of the use of cosine measure to determine the degree of document similarity

Автор: Yatsko Vyacheslav

Журнал: Грани познания @grani-vspu

Рубрика: Информационные технологии

Статья в выпуске: 4 (69), 2020 года.

Бесплатный доступ

The article deals with the assessment of the efficiency of using the cosine metrics to determine documents similarity for solving the task of the author’s attribution of text documents. The source statistic data were the distribution of stop words in three fiction works, the two of which were written by one author. There is demonstrated that a more adequate result is obtained while using this metrics applied to the deviations of stop words frequencies from Zipf distribution on condition that source texts are pre-aligned.

Text documents similarity, cosine measure, zipf distribution, stop words, document classification

Короткий адрес: https://sciup.org/148310508

IDR: 148310508

Статья научная