Applying machine learning for prediction of cardiovascular diseases on small data sets
Автор: Kaledina Elena Alexandrovna, Kaledin Oleg Evgenievich, Kulyagina Taisiya Ivanovna
Журнал: Проблемы информатики @problem-info
Рубрика: Прикладные информационные технологии
Статья в выпуске: 1 (54), 2022 года.
Бесплатный доступ
As a result of increasing computing power and generating large amounts of data, artificial intelligence algorithms are currently being actively used to perform a wide range of medical tasks. One of the most important areas in which the use of artificial intelligence can be useful is the diagnosis of various diseases and the prediction of their possible outcomes. Cardiovascular diseases are one of the main factors of mortality and disability in most countries of the world, including the Russian Federation. The most important risk factor for two major cardiovascular diseases (myocardial infarction and cerebral stroke) is arterial hypertension. Therefore, the main task of primary prevention of cardiovascular diseases (CVD) is the timely detection of a high risk of fatal CVD in patients with diagnosed uncomplicated arterial hypertension. The use of machine learning algorithms can solve this problem and significantly improve the accuracy of predicting cardiovascular diseases and their complications. Machine learning methods are the main tool of artificial intelligence, the use of which allows you to automate the processing and analysis of big data, identify hidden or non-obvious patterns on this basis, and extract new knowledge. This article describes the process of using machine learning algorithms to predict the risk of developing adverse cardiovascular events in patients with diagnosed arterial hypertension in the next 12, 24 and 36 months. The analysis included 16 predictors, which are a combination of both standard indicators of the risk of cardiovascular diseases (age, male sex, smoking, elevated cholesterol, impaired uric acid metabolism), and some specific indicators. A distinctive feature of this task is the use of local data collected in a separate region of the Russian Federation as a training data set. This feature can improve the adaptability of the predictive model to possible local features of the development of cardiovascular diseases, however, it also has a significant drawback - a small amount of training data, which contributes to model retraining and, as a result, a decrease in its ability to generalize. The target feature in the study is a binary predictive vector of major adverse cardiovascular events at three reference time points. Due to the fact that censoring, as well as some of the considered cardiovascular diseases, can occur simultaneously or be repeated throughout all or part of the observation period, the study is formally presented as a solution to the multilabel classification problem. The paper indicates the stages of forming a data set and explores predictive machine learning algorithms on small sets to create a model for calculating the risks of cardiovascular diseases. The advantages and disadvantages of individual ensemble methods of machine learning machine learning methods (binary relevance, multioutput classifier, label powerset, MLkNN, classifier chain) for the development of predictive algorithms in the conditions of the problem are shown. From the results of the study, we can say that the machine learning algorithms - multioutput classifier and labelpowerset on a small dataset showed the best result among all the analyzed methods for assessing the development of cardiovascular diseases. This fact makes it relevant to study the application of this method on samples of large volumes, with the inclusion of a larger set of risk factors.
Machine learning algorithms, data analysis, cardiovascular disease prediction
Короткий адрес: https://sciup.org/143179066
IDR: 143179066 | DOI: 10.24412/2073-0667-2022-1-66-76