On the parametric model of length distribution of the words on the literary texts example in Spanish, Italian and Swedish languages

Бесплатный доступ

We study regularities, to which the relative frequencies of the word lengths are subject, if the entire series of relative frequencies is divided into several segments. In the case of the Spanish language, there are four segments: lengths 1-2 (linear function with positive slope); Lengths 3-5 (a polynomial of the second order with branches directed upwards); Lengths 6-11 (linear function with negative slope); Length 12 and more (geometric progression with a denominator less than 1). Here n is the length of the word (the number of letters in it). In the case of the Italian language, there are also four lengths: lengths 1-3 and 4-6 (polynomials of the second order with branches directed downwards); Length 7-11 (geometric progression with denominator less than 1); Length 12 and more (geometric progression with a denominator less than 1). In the case of the Swedish language, there are three segments: lengths 1-3 (a second-order polynomial with branches pointing upwards); Length 4-6 (second-order polynomial with branches directed downwards); Length 7 and more (geometric progression with a denominator less than 1). Coefficients of equations are parameters that can be estimated for a given text on the basis of its statistical characteristics. Five texts in Spanish and Swedish and six texts in Italian were considered. Then all the texts in the given language were combined into one text and distribution was considered.

Еще

Text in spanish, text in italian, text in swedish, word length, parametric model of word-length distribution

Короткий адрес: https://sciup.org/14111682

IDR: 14111682   |   DOI: 10.5281/zenodo.842975

Статья научная