Efficiency of the use of cosine measure to determine the degree of document similarity
Автор: Yatsko Vyacheslav
Журнал: Грани познания @grani-vspu
Рубрика: Информационные технологии
Статья в выпуске: 4 (69), 2020 года.
Бесплатный доступ
The article deals with the assessment of the efficiency of using the cosine metrics to determine documents similarity for solving the task of the author’s attribution of text documents. The source statistic data were the distribution of stop words in three fiction works, the two of which were written by one author. There is demonstrated that a more adequate result is obtained while using this metrics applied to the deviations of stop words frequencies from Zipf distribution on condition that source texts are pre-aligned.
Text documents similarity, cosine measure, zipf distribution, stop words, document classification
Короткий адрес: https://sciup.org/148310508
IDR: 148310508