Method of finding related indicators based on analysis of regulatory legal acts by NLP methods

Бесплатный доступ

Modern methods of forecasting time series allow us to obtain very accurate and high-quality forecasts in the presence of retrospective data. However, the results of these methods are determined by the volume and quality of the training sample. When a time series is missing, has a small number of points, or is not reliable at all, time series forecasting methods are ineffective. In this case, it is customary to use approaches to find other indicators that somehow correlate with the desired one, hereinafter referred to as indirect indicators. As part of the work on forecasting socio-economic indicators, it became necessary to form a list of indirect indicators, however, the available solutions for this task do not provide the required reliability. In most cases, these works use data from social networks, forums and other data sources that cannot be considered objective. Since they are an expression of a subjective point of view and may be subject to deliberate falsifications and distortions. Such risks are unacceptable when developing a system created for making managerial decisions at the state level. Aim. Development of methods for searching for indirect indicators based on objective sources of information. These methods make it possible to form a list of indirect indicators without involving experts and eliminating the risks of inaccuracy of primary data. Materials and methods. The research was conducted on the basis of regulatory legal acts of the Russian Federation and its subjects. This source was chosen because regulatory documents are objective and fundamental documents of the state. They are not a representation of the subjective point of view of the author or a group of persons. For the experiment, a part of the regulatory framework from 2016 to 2021 was collected, related to the categories: agriculture, medicine, social sphere and others. Results. The method of finding indirect indicators is defined, various algorithms for ranking indirect indicators are developed and tested, indirect indicators for several socio-economic indicators are formed. The process of identifying indirect indicators is based on the application of Data Mining and NLP methods to the database of regulatory legal acts of the Russian Federation. Conclusion. The resulting solution allowed us to form a list of N-grams associated with the desired indicator. At this stage, the interpretation of the N-gram into an indicator is carried out with the help of an expert, however, this does not require having competencies in the subject area of the indicator.

Еще

Socio-economic indicators, n-gram, vdl activity indicator, data mining, nlp

Короткий адрес: https://sciup.org/147236517

IDR: 147236517   |   DOI: 10.14529/ctcr220107

Статья научная