Comparison of machine learning models for coronavirus prediction
Автор: Amos B.K., Smirnov I.V., Hermann M.M.
Журнал: Вестник Донского государственного технического университета @vestnik-donstu
Рубрика: Информатика, вычислительная техника и управление
Статья в выпуске: 1 т.22, 2022 года.
Бесплатный доступ
Coronavirus, also known as COVID-19, was first detected in Wuhan, China, in December 2019. It is a family of viruses ranging from the common cold to severe acute respiratory syndrome (SARS). The symptoms of such a virus are similar to those of a cold or seasonal allergies. Like other respiratory viruses, it is mainly transmitted through airborne droplets when coughing or sneezing. Therefore, the recognition of COVID-19 requires careful laboratory analysis, and the reduction of recognition resources is a major challenge. On 11 March, 2020, the World Health Organization (WHO) declared COVID-19, caused by SARS-CoV-2, a pandemic, as there had been an exponential increase in cases worldwide, and demand for intensive beds and related structures had far exceeded existing capacity. The first examples of this are the regions of Italy. Brazil registered the first case of SARS-CoV-2 on 02/26/2020. Transmission of the virus in this country shifted very quickly from imported cases to local and, finally, community missions, with the Brazilian federal government announcing national community transmission on 03/20/2020. As of March 23, in the state of São Paulo with a population of about 12 million people, where the Israelita Albert Einstein Hospital is located, 477 cases of the disease and 30 related deaths were registered, and on March 27, there were already 1223 cases of COVID-19 with 68 concomitant deaths. To slow the spread of the virus in the state of São Paulo, quarantines and social distancing measures were introduced. One of the motivations for this challenge is the fact that, in the context of an extensive healthcare system with the possible limitation of SARS-CoV-2 testing, it is not practical to test every case, and test results can only be used in testing the target subpopulation. The study objective is to build a model based on machine learning that can predict the detection of SARS-CoV-2 from medical data. For this, various classification models of machine learning are compared, and the best one to predict coronaviruses is determined. The comparison is based on individuals in class 1, i.e., those with a positive test. Therefore, it is required to determine the machine learning model with the best response and F1 score for class 1.Materials and Methods. An open-source data set from the Israelita Albert Einstein Hospital in São Paulo, Brazil, was taken as a basis. The following machine learning models were used for the study: RandomForests (RF), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT) and AdaBoost (AB), as well as the 10-time cross-validation technique. Some machine learning performance measures, such as accuracy, recall, and F1 score were evaluated.Results. Out of a total of 5,644 people tested during the COVID-19 pandemic, 5,086 people tested negative and 558 people tested positive. At the same time, support for machine vectors showed the best results in detecting coronavirus with a recall of 75 % and an F1 score of 60 % compared to models: Random drill, KNN, LR, AB, and DT.Discussion and Conclusions. It was found that when using AB algorithms, greater accuracy is achieved, but the stability of the LSVM algorithm is higher. Therefore, it can be recommended as a useful tool for detecting COVID-19.
Covid-19 detection, classification, machine learning models
Короткий адрес: https://sciup.org/142234456
IDR: 142234456
Список литературы Comparison of machine learning models for coronavirus prediction
- Zhou P, Yang XL, Wang XG, et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579:270-273. https://doi.org/10.1038/s41586-020-2012-7
- Wu F, Zhao S, Yu B, et al. A new coronavirus associated with human respiratory disease in China. Nature. 265-269. https://doi.org/10.1038/s41586-020-2008-3
- World Health Organization Coronavirus Disease 2019 (COVID-19) Situation Report-97. Available from: https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200426-sitrep-97-covid-19.pdf
- Wang C, Horby PW, Hayden FG, et al. A novel coronavirus outbreak of global health concern. Lancet. 2020;395:470-473. https://doi.org/10.1016/S0140-6736(20)30185-9
- Hui DSC, Zumla A. Severe acute respiratory syndrome: historical, epidemiologic, and clinical features. Infect Dis Clin North Am. 2019;33:869-889. https://doi.org/10.1016/udc.2019.07.001
- Azhar EI, Hui DSC, Memish ZA, et al. The Middle East respiratory syndrome (MERS). Infect Dis Clin North Am. 2019;33:891-905. https://doi.org/10.1016/udc.2019.08.001 g
- Corman VM, Muth D, Niemeyer D, et al. Hosts and sources of endemic human coronaviruses. Adv Virus Res. 2018;100:163-188. https://doi.org/10.1016/bs.aivir.2018.01.001 Virus Res. 2014;189:262-270. https://doi.org/10.1016/i.virusres.2014.05.026
- Andersen KG, Rambaut A, Lipkin WI, et al. The proximal origin of SARS-CoV-2. Nat Med. 2020;26:450- is 452. https://doi.org/10.1038/s41591-020-0820-9 «
- Almazan F, Sola I, Zuniga S, et al. Coronavirus reverse genetic systems: infectious clones and replicons. is
- Nao N, Yamagishi J, Miyamoto H, et al. Genetic predisposition to acquire a polybasic cleavage site for j* highly pathogenic avian influenza virus hemagglutinin. mBio. 2017;8:e02298. http://dx.doi.org/10.1128/mBio.02298-16 ^
- Huang C, Wang Y, Li X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, § China. Lancet. 2020;395:497-506. https://doi.org/10.1016/S0140-6736(20)30183-5 |
- Wang D, Hu B, Hu C, et al. Clinical characteristics of 138 hospitalized patients with 2019 novel g coronavirus-infected pneumonia in Wuhan, China. JAMA. 2020;323:1061. https://doi.org/10.1001/jama.2020.1585 tC
- Zhu N, Zhang D, Wang W, et al. A novel coronavirus from patients with pneumonia in China, 2019. ^ N Engl J Med. 2020;382:727-733. https://doi.org/10.1056/NEJMoa2001017 £
- Chen N, Zhou M, Dong X, et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel y coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet. 2020;395:507-513. https://doi.org/10.1016/S0140-6736(20)30211-7
- Lippi G, Plebani M. The critical role of laboratory medicine during coronavirus disease 2019 (COVID-19) £ and other viral outbreaks. Clin Chem Lab Med. 2020;58:1063-1069. https://doi.org/10.1515/cclm-2020-024
- Bhargava A, Fukushima EA, Levine M, et al. Predictors for severe COVID-19 infection. Clin Infect Dis. 2020;71:1962-1968 https://doi.org/10.1093/cid/ciaa674 73
- Wang CZ, Hu SL, Wang L, et al. Early risk factors of the exacerbation of Coronavirus disease 2019 pneumonia. J Med Virol. 2020;91:2593-2599 https://doi.org/10.1002/imv.26071
- Hamming I, Timens W, Bulthuis ML, et al. Tissue distribution of ACE2 protein, the functional receptor for SARS coronavirus. A first step in understanding SARS pathogenesis. J Pathol. 2004;203:631-637. https://doi.org/10.1002/path. 1570
- Renu K, Prasanna PL, Valsala Gopalakrishnan A. Coronaviruses pathogenesis, comorbidities and multiorgan damage — a review. Life Sci. 2020;255:117839. https://doi.org/10.1016/nfs.2020.117839
- Long B, Brady WJ, Koyfman A, et al. Cardiovascular complications in COVID-19. Am J Emerg Med. 2020;38 :1504-1507 https://doi.org/10.1016/i.aiem.2020.04.048
- Ruan Q, Yang K, Wang W, et al. Clinical predictors of mortality due to COVID-19 based on an analysis of data of 150 patients from Wuhan, China. Intensive Care Med. 2020;46:846-848. https://doi.org/10.1007/s00134-020-05991-x
- Lippi G, Favaloro EJ. D-dimer is associated with severity of coronavirus disease 2019: a pooled analysis. Thromb Haemost. 2020;120:876-878. http://dx.doi.org/10.1055/s-0040-1709650
- Lang J, Yang N, Deng J, et al. Inhibition of SARS pseudovirus cell entry by lactoferrin binding to heparan sulfate proteoglycans. Plos One. 2011;6:e23710. https://doi.org/10.1371/iournal.pone.0023710
- Vicenzi E, Canducci F, Pinna D, et al. Coronaviridae and SARS-associated coronavirus strain HSR1. Emerging Infect Dis. 2004;10:413-418. https://doi.org/10.3201/eid1003.030683
- Belen-Apak FB, Sarialioglu F. The old but new: can unfractioned heparin and low molecular weight heparins inhibit proteolytic activation and cellular internalization of SARSCoV2 by inhibition of host cell proteases? Med Hypotheses. 2020;142:109743. https://doi.org/10.1016/i.mehy.2020.109743
- Henry BM, Benoit SW, Santos de Oliveira MH, et al. Laboratory abnormalities in children with mild and severe coronavirus disease 2019 (COVID-19): a pooled analysis and review. Clin Biochem. 2020;81:1-8. https://doi.org/10.1016/i.clinbiochem.2020.05.012
- Sanna G, Serrau G, Bassareo PP, et al. Children's heart and COVID-19: Up-to-date evidence in the form of a systematic review. Eur J Pediatr. 2020;179:1079-1087 https://doi.org/10.1007/s00431-020-03699-0
- Leung NHL, Chu DKW, Shiu EYC, et al. Respiratory virus shedding in exhaled breath and efficacy of face masks. Nature Med. 2020;26:676-680. https://doi.org/10.1038/s41591-020-0843-2
- Abdi MJ, Giveki D. Automatic detection of erythemato-masquamous diseases using PSO-SVM based on association rules. Technical applications of artificial intelligence. 2013;26:603-608. https://doi.org/10.1016/iengappai.2012.01.017
- McDonald JH. Handbook of Biological Statistics, 3rd ed. Sparky House Publishing: Sparky House Publishing; 2014.
- Mangiafico SS. An R companion for the handbook of biological statistics, 1.3.3 ed. New Brunswick, NJ: Rutgers Cooperative Extension; 2015.