Clustering Russian Federation regions according to the level of socio-economic development with the use of machine learning methods

Автор: Ketova Karolina V., Kasatkina Ekaterina V., Vavilova Diana D.

Журнал: Экономические и социальные перемены: факты, тенденции, прогноз @volnc-esc

Рубрика: Региональная экономика

Статья в выпуске: 6 т.14, 2021 года.

Бесплатный доступ

The paper solves the problem of clustering Russian Federation regions according to their socioeconomic development, taking into account the sectoral structure of the gross regional product. Classical machine learning methods are a tool for solving the clustering problem. The object of the study is the differentiation of regions according to various socio-economic indicators. The subject of the study is the practice of using machine learning methods for clustering objects. The initial database for solving the problem of clustering regions includes actual statistical data on socio-economic development of RF constituent entities and the sectoral structure of their gross regional product as of 2019. We identify clusters of regions according to their socio-economic development with the use of modern machine learning methods implemented in Python, a high-level programming language, with the connection of libraries for working with data: Pandas, Sklearn, SciPy, etc. The preprocessing of the initial data was carried out: digitization of data categories, transition to specific values, standardization of indicators. The initial data set for 2019 contains 5,525 records on 65 indicators of socio-economic development for 85 regions of the Russian Federation. It identifies 15 basic indicators of socio-economic development of a region, based on the principal component analysis. According to these indicators, five regional clusters were identified with the use of the k-means clustering: the first cluster is characterized by a high share of wholesale and retail trade, real estate transactions, professional, scientific and technological activities in the GRP structure; the second cluster specializes in manufacturing, wholesale and retail trade, real estate transactions, agriculture and forestry; the third cluster can be described as a cluster with a mixed economy, which is characterized by averages for the main socio-economic indicators in the Russian Federation; regions of the fourth cluster show a high level of unemployment and a high share of public administration, military and social security; the fifth cluster specializes in mining.

Еще

Socio-economic indicators, industry structure, gross regional product, machine learning, cluster analysis, principal component analysis

Короткий адрес: https://sciup.org/147236375

IDR: 147236375   |   DOI: 10.15838/esc.2021.6.78.4

Статья научная