An Interpretable Hybrid Machine Learning Framework for Robust Type II Diabetes Prediction Using Electronic Health Records

Authors

  • Dr. Mazher Khan University of South Florida, Florida, USA Author
  • Dr. Prasun Chakrabarty Department of Computer Science and Engineering, Sir Padampat Singhania University, Udaipur, Rajasthan, India Author
  • Dr. Bhuvan Unhelkar University of South Florida, Florida, USA Author
  • Dr. Amairullah Khan Lodhi Department of Electronics and Communication Engineering, Shadan College of Engineering & Technology, Hyderabad, India Author
  • Dr. Ali Hussain Department of Computer Science and Engineering, Srinidhi Institute of Science and Technology, Hyderabad, India Author

Keywords:

Electronic Health Records (EHRs), Medical Data Analytics, Predictive Modeling, Explainable AI, Random Forest, SHAP, Domain Adaptation, Digital Health Systems.

Abstract

Diabetes mellitus type II (T2DM) happens to be a fast-emerging international health issue that
needs timely and accurate risk identification to mitigate chronic conditions. EHRs offer an excellent platform
for longitudinal clinical data, but predictive modeling with EHRs is plagued by data quality challenges, highdimensional
feature spaces, and limited generalizability across healthcare facilities. The paper presents a
proposed interpretable hybrid machine-learning architecture for predicting T2DM with high accuracy using
structured EHR data. The proposed pipeline combines systematic data preprocessing (imputation, outlier
handling, normalization), a hybrid feature selection strategy (Random Forest importance and SHAP-based
explainability), and domain-conscious model validation to improve generalization. Evaluation of the
framework is conducted on benchmark datasets such as Pima Indians, UCI Diabetes 130-US Hospitals,
MIMIC-III, and a simulated longitudinal EHR dataset. The experimental evidence illustrates a gradual
improvement in performance across processing stages, reaching 91% accuracy and an AUROC of 0.93 with
the Random Forest + SHAP model. The analysis reveals that the clinically established predictors identified by
feature importance in the present research are glucose, A1C, insulin, BMI, and blood pressure, which align
with diagnostic guidelines. A comparative study of the proposed methodology with LSTM and transformerbased
models demonstrates that it strikes a good balance between predictive accuracy and interpretability.
The findings show that hybrid model explainability and structured preprocessing are highly effective in
increasing robustness, fairness, and clinical usability. The prescribed framework includes a scalable, reliable
blueprint for practical implementation in digital health and clinical decision support systems.

Downloads

Published

2026-03-06

How to Cite

An Interpretable Hybrid Machine Learning Framework for Robust Type II Diabetes Prediction Using Electronic Health Records. (2026). International Journal of Artificial Intelligence and Computer Electronics, 2(1), 1-9. https://ijaice.com/journal/index.php/ijaice/article/view/7