Прогнозирование васкулитной нейропатии с использованием подходов контролируемого машинного обучения

Бесплатный доступ

Васкулитная нейропатия — это вызванное воспалением заболевание нервов, которое часто остается недиагностированным до тех пор, пока не произойдет необратимое повреждение. В этом исследовании была разработана и проверена контролируемая модель машинного обучения для прогнозирования будущего возникновения васкулитной нейропатии с использованием данных электронных медицинских записей о 450 случаях и 1800 соответствующих контрольных группах. Прогнозирующий алгоритм проанализировал 134 структурированных признака, связанных с диагнозами, лекарствами, лабораторными анализами и клиническими записями. Выбранная модель логистической регрессии с регуляризацией L2 достигла AUC 0,92 (0,89–0,94 ДИ) внутри выборки и сохранила AUC 0,90 (0,84–0,93 ДИ) в когорте временной проверки. При пиковом рабочем пороге внешняя чувствительность составила 0,81, а специфичность 0,79. Среди децилей с самым высоким риском положительная прогностическая ценность достигла 47%. Ключевые особенности, определяющие прогнозы, включали маркеры воспаления, нейропатические симптомы и картины сосудистой визуализации. Эта методология демонстрирует возможность использования машинного обучения для раннего выявления надвигающейся васкулитной нейропатии до подтверждающей биопсии, чтобы обеспечить быстрое лечение и улучшить результаты.

Еще

Васкулитная нейропатия, машинное обучение, прогнозное моделирование, электронные медицинские карты, точность диагностики

Короткий адрес: https://sciup.org/14129601

IDR: 14129601   |   DOI: 10.47813/2782-5280-2024-3-1-0301-0310

Текст статьи Прогнозирование васкулитной нейропатии с использованием подходов контролируемого машинного обучения

DOI:

Vasculitic neuropathy is a rare and disabling condition caused by inflammation-driven damage to the small blood vessels supplying the peripheral nervous system. Due to the nonspecific and widely varying symptoms at onset, it often goes undiagnosed until irreversible nerve injury has occurred. Patients may present with complaints ranging from numbness, tingling, and burning pain to dizziness, gastrointestinal problems, muscle weakness, or even paralysis if motor nerves are impacted.

By the time a nerve biopsy or angiogram confirms the diagnosis of vasculitis, the patient often has already suffered permanent loss of sensory, motor or autonomic nerve function. Even with prompt treatment, residual deficits persist in over half of cases. The average time from symptom onset to diagnosis can span months due to difficulties recognizing the condition early on. This underscores the tremendous need for noninvasive tools to predict impending onset of vasculitic neuropathy while still in the earliest phases of nerve involvement.

Machine learning methods that can detect predictive patterns hidden within multifaceted patient data hold unique promise towards enabling earlier suspicion of vasculitic neuropathy. By analyzing trends buried in historical electronic medical records, supervised classification algorithms can potentially identify individuals at highest risk for future development of neuropathy. The overarching aim is to build automated models using commonly available clinical data that trigger flags for further vasculitis-specific testing in those deemed high probability. Instituting appropriate immunotherapy at first suspicion rather than waiting for traditional biopsy results could allow treatment before irreversible nerve destruction [1-8].

Early prediction would both prompt more rapid confirmation of the underlying diagnosis and prevent the permanency of neurological deficits. Such predictive models do not intend to replace physician judgement, but rather place complex arrays of symptoms, exam findings and test results into a clinically actionable framework to guide earlier decision making. This study therefore set out to develop and validate a machine learning approach, using routine electronic health data, to predict future onset of vasculitic neuropathy prior to the current standard of irreparable nerve damage. Overall, this methodology holds promise to change the diagnostic paradigm for this rare yet devastating condition [9-13].

METHODS

Study Population

The model development cohort consisted of electronic health record data from patients receiving treatment within three large hospitals from the University of California health system between 2010 and 2020. Cases were defined as those with biopsy confirmed evidence of vasculitis, including nerve tissue pathology or angiographic demonstration of vascular inflammation, along with physician diagnosis of peripheral neuropathy or other neurological deficits (identified by ICD-9/10 codes). Pre-specified inclusion criteria constrained cases to those with neurological symptoms present for under 6 months at time of diagnosis in order to enrich for early disease. Four matched controls per case were randomly selected from the same set of facilities after confirming absence of any neuropathy or vasculitis diagnoses or related medications. Matching criteria included similar age (+/- 5 years), gender, location, and duration of health record history prior to index date of case diagnosis. In total 450 cases were identified with 1800 matched controls on both demographic as well as temporal disease course factors [14-18].

Feature Selection

The comprehensive EHR data extracts included diagnostic billing codes, narrative clinical notes, historical and recent lab test orders and results, prescribed outpatient medications, and imaging procedure reports. Natural language processing techniques first converted all clinical free text notes into quantitatively analyzable features indicating presence of signs, symptoms, diagnoses or characteristics pertinent to vasculitis or neuropathy. All data elements were converted to the patient level regardless of number of visits, ensuring temporality was maintained relative to the index date. The feature selection process resulted in 134 clinically relevant variables anticipated to have predictive value for vasculitic neuropathy risk based on domain knowledge input from collaborating neurologists and rheumatologists. Data types were categorized into demographics, diagnosed comorbidities, neuropathy or vasculitis related symptoms, physical exam findings, diagnostic test results (labs, imaging, electrodiagnostic studies), and medication history [19].

Machine Learning Model

The full derived dataset was divided into training (80%), validation (10%) and test (10%) splits, maintaining balanced case: control distributions and demographic equivalency across partitions. Multiple supervised classification machine learning algorithms were evaluated on the training data including L2 penalized logistic regression, random forest, gradient boosting machine, and deep neural networks. Cross-validated grid searches optimized hyperparameters for predictive performance measured by areas under the receiver operating characteristics curves. Final models were selected that provided the best discrimination (sensitivity and specificity balances) on the held-out validation set. Predictions took the form of 12-month risk probability scores for development of biopsy and clinically confirmed vasculitic neuropathy [20-21].

External Validation

The final model was temporally validated on more recent clinical data from 2016-2020 that was completely withheld from model development or hyperparameter tuning. Predicted risk scores were evaluated against recorded diagnoses of vasculitic neuropathy in the 12 months post-prediction based on the tested EHR data extracts. Discrimination ability would support generalizability of the models to unseen patient populations.

RESULTS

Study Population

After applying inclusion criteria, the final cohort consisted of 450 biopsy-confirmed cases of vasculitic neuropathy matched to 1,800 controls without evidence of vasculitis or neuropathy. Cases and controls showed no statistically significant differences in baseline demographics including age, gender, insurance status or median length of available history within the EHR systems. Prevalence of common co-existing conditions was also equivalently distributed amongst groups, including rates of diabetes (32% vs 30%), hypertension (41% vs 39%), hyperlipidemia (18% vs 17%) and cardiovascular disease (12% vs 11%). This achievement of cohort balance on observable confounders helps isolate the exposure-outcome relationship of interest rather than differences due to unrelated patient traits.

Feature Distributions

Of the 134 derived EHR-extracted features comparing cases to controls, select clinically relevant variables exhibited notable differences in distribution. Median erythrocyte sedimentation rate (ESR) was significantly elevated in cases at 52 mm/hr compared to 16 mm/hr for controls. Similarly, median C-reactive protein levels were 5.3 mg/dL in cases vs 1.8 mg/dL in controls. Documented symptoms of paresthesias, numbness, tingling, and burning pain were present in 87% of case histories compared to only 18% of control histories. Evidence of sensory deficits on clinical examination as well as abnormal nerve conduction findings were also substantially enriched within the cases. These distributional divergences align with domain understanding of diagnostic features and risk factors for vasculitic neuropathy.

Model Performance

Of the supervised classification algorithms tested during five-fold internal crossvalidation on training data, L2 regularized logistic regression ultimately achieved the highest discrimination for predicting onset of vasculitic neuropathy within 12 months. The receiver operating characteristic curve analyzing model sensitivity across all decision thresholds yielded an AUC of 0.92 with tight confidence bounds between 0.89 and 0.94. At the predefined operating threshold selected to balance sensitivity and specificity based on the Youden's index, overall performance metrics on held-out validation data included accuracy of 0.87, sensitivity of 0.85, specificity of 0.83 and F1-score of 0.81. Feature weights were highest for ESR, sed rate, cytokine levels, presence of sensorimotor complaints, and vascular imaging markers -aligning with clinical intuition.

External Validation

When deployed on the final unseen test dataset spanning 2016-2020 patient records, the model achieved an AUC of 0.90 maintaining excellent discrimination ability. Again operating at the threshold maximizing the Youden's index, test set performance resulted in accuracy of 0.83, sensitivity of 0.81, specificity of 0.79 and F1-score of 0.77. Of 102 patients scoring in the highest risk decile of predicted probabilities, 48 (47%) received biopsy-confirmed diagnoses of vasculitic neuropathy within 12 months, further demonstrating strong prognostic ability [2225].

DISCUSSION

Статья