An Interpretable Hybrid Machine Learning Framework for Robust Type II Diabetes Prediction Using Electronic Health Records

Dr. Mazher Khan; Dr. Prasun Chakrabarty; Dr. Bhuvan Unhelkar; Dr. Amairullah Khan Lodhi; Dr. Ali Hussain

doi:10.63665/IJAICE.0201.01

Authors

Dr. Mazher Khan University of South Florida, Florida, USA Author
Dr. Prasun Chakrabarty Department of Computer Science and Engineering, Sir Padampat Singhania University, Udaipur, Rajasthan, India Author
Dr. Bhuvan Unhelkar University of South Florida, Florida, USA Author
Dr. Amairullah Khan Lodhi Department of Electronics and Communication Engineering, Shadan College of Engineering & Technology, Hyderabad, India Author
Dr. Ali Hussain Department of Computer Science and Engineering, Srinidhi Institute of Science and Technology, Hyderabad, India Author

DOI:

https://doi.org/10.63665/IJAICE.0201.01

Keywords:

Electronic Health Records (EHRs), Medical Data Analytics, Predictive Modeling, Explainable AI, Random Forest, SHAP, Domain Adaptation, Digital Health Systems.

Abstract

Diabetes mellitus type II (T2DM) happens to be a fast-emerging international health issue that
needs timely and accurate risk identification to mitigate chronic conditions. EHRs offer an excellent platform
for longitudinal clinical data, but predictive modeling with EHRs is plagued by data quality challenges, highdimensional
feature spaces, and limited generalizability across healthcare facilities. The paper presents a
proposed interpretable hybrid machine-learning architecture for predicting T2DM with high accuracy using
structured EHR data. The proposed pipeline combines systematic data preprocessing (imputation, outlier
handling, normalization), a hybrid feature selection strategy (Random Forest importance and SHAP-based
explainability), and domain-conscious model validation to improve generalization. Evaluation of the
framework is conducted on benchmark datasets such as Pima Indians, UCI Diabetes 130-US Hospitals,
MIMIC-III, and a simulated longitudinal EHR dataset. The experimental evidence illustrates a gradual
improvement in performance across processing stages, reaching 91% accuracy and an AUROC of 0.93 with
the Random Forest + SHAP model. The analysis reveals that the clinically established predictors identified by
feature importance in the present research are glucose, A1C, insulin, BMI, and blood pressure, which align
with diagnostic guidelines. A comparative study of the proposed methodology with LSTM and transformerbased
models demonstrates that it strikes a good balance between predictive accuracy and interpretability.
The findings show that hybrid model explainability and structured preprocessing are highly effective in
increasing robustness, fairness, and clinical usability. The prescribed framework includes a scalable, reliable
blueprint for practical implementation in digital health and clinical decision support systems.

An Interpretable Hybrid Machine Learning Framework for Robust Type II Diabetes Prediction Using Electronic Health Records

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

call for paper

sbmission-btn

Visitor

menu

index

Information

Reach Us

Important Links

Downloads & Indexing

Ethics & Policies