An Interpretable Hybrid Machine Learning Framework for Robust Type II Diabetes Prediction Using Electronic Health Records
Keywords:
Electronic Health Records (EHRs), Medical Data Analytics, Predictive Modeling, Explainable AI, Random Forest, SHAP, Domain Adaptation, Digital Health Systems.Abstract
Diabetes mellitus type II (T2DM) happens to be a fast-emerging international health issue that
needs timely and accurate risk identification to mitigate chronic conditions. EHRs offer an excellent platform
for longitudinal clinical data, but predictive modeling with EHRs is plagued by data quality challenges, highdimensional
feature spaces, and limited generalizability across healthcare facilities. The paper presents a
proposed interpretable hybrid machine-learning architecture for predicting T2DM with high accuracy using
structured EHR data. The proposed pipeline combines systematic data preprocessing (imputation, outlier
handling, normalization), a hybrid feature selection strategy (Random Forest importance and SHAP-based
explainability), and domain-conscious model validation to improve generalization. Evaluation of the
framework is conducted on benchmark datasets such as Pima Indians, UCI Diabetes 130-US Hospitals,
MIMIC-III, and a simulated longitudinal EHR dataset. The experimental evidence illustrates a gradual
improvement in performance across processing stages, reaching 91% accuracy and an AUROC of 0.93 with
the Random Forest + SHAP model. The analysis reveals that the clinically established predictors identified by
feature importance in the present research are glucose, A1C, insulin, BMI, and blood pressure, which align
with diagnostic guidelines. A comparative study of the proposed methodology with LSTM and transformerbased
models demonstrates that it strikes a good balance between predictive accuracy and interpretability.
The findings show that hybrid model explainability and structured preprocessing are highly effective in
increasing robustness, fairness, and clinical usability. The prescribed framework includes a scalable, reliable
blueprint for practical implementation in digital health and clinical decision support systems.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Authors

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.


