Mathematical approach to the establishment of authorship and time of creation of text based on a study of entropy

Бесплатный доступ

Research of dependence of the entropy and the frequency characteristics of the text from the time of its creation and authorship. A brief review of existing methods of attribution and proposed a new approach, based on an analysis of the complex value of the text - the entropy, which is one of the main characteristics of information theory.. To study used literary texts, written in prose in Russian. Discussed in detail the concept of entropy as complex text characteristics. Are examples of the effects of various works on the frequency analysis. It is shown that not only the frequency analysis, but its integral characteristic - entropy, can be used for the attribution of the text that can serve as the creation of a new approach to the problem of authorship and time of creation of the text. For authors from different centuries to take a group of works and examined the relationship between authorship, entropy value of the frequency analysis, a year and a century of writing works. It is shown that for one author, entropy has a normal distribution. Using the method of least squares derived linear relationship, the creation of the work of the entropy of the text, calculated the accuracy of the derived formula. On the basis of frequency analysis discussed in detail texts from XVII, XVIII, XIX and XX centuries. Resulting from research on the value attributed to various factors influencing the development of the Russian literary language, some of these factors are given in this work. Calculated error resulting century works. Based on the study of the entropy of texts, it is shown that in many cases, the authors have non-overlapping or slightly overlapping each other ranges of values of entropy. This fact allows us to do a comparative analysis and judge accessories particular work, a particular author, so the work was considered a work of authorship "And Quiet Flows the Don". Because the entropy is an objective characteristic of the text, which does not depend on subjective assessments and analysis of the text based on it is not time consuming, it is possible to speak about a new mathematical approach to the attribution of the text in the framework of information technology.

Еще

Entropy, frequency analysis, text attribution

Короткий адрес: https://sciup.org/14729942

IDR: 14729942

Статья научная