Optimasi Hyperparameter Ensemble Learning untuk Prediksi Perkembangan Penyakit Diabetes dengan Explainable AI

Authors

  • David Suharjanto Sunan Kalijaga State Islamic University Yogyakarta image/svg+xml
  • Muhammad Syafiq Akmal Sunan Kalijaga State Islamic University Yogyakarta image/svg+xml
  • Nur Fikri Khuluq Sunan Kalijaga State Islamic University Yogyakarta image/svg+xml
  • Muh Naufal Muzhaffar Sunan Kalijaga State Islamic University Yogyakarta image/svg+xml
  • Maria Ulfah Siregar Sunan Kalijaga State Islamic University Yogyakarta image/svg+xml

DOI:

https://doi.org/10.14421/jiska.5953

Keywords:

Ensemble Learning, Hyperparameter Optimization, Random Forest, XGBoost, Diabetes Prediction

Abstract

This research focuses on optimizing and assessing ensemble learning models for predicting diabetes progression by combining hyperparameter tuning and explainable artificial intelligence techniques. Experiments were conducted using the scikit-learn diabetes dataset, which contains 442 samples with ten numerical features representing patients’ clinical conditions. The data were split into 80% for training and 20% for testing. Two ensemble methods were explored: Random Forest Regressor (bagging) and XGBoost Regressor (boosting). Hyperparameter optimization was carried out using RandomizedSearchCV and BayesianSearchCV under a five-fold cross-validation scheme. Model performance was evaluated using MAE, MSE, RMSE, and R² metrics, while interpretability was examined through SHAP summary plots. The results indicate that BayesianSearchCV consistently delivered superior performance gains compared to random search. In particular, the optimized XGBoost model achieved an R² score of 0.5018, improving by 19.8% over the baseline model (R² = 0.4188), and reduced RMSE from 55.49 to 51.37. SHAP analysis showed that serum triglycerides, body mass index, and blood pressure were the most influential features. Overall, the findings suggest that Bayesian-based hyperparameter optimization can effectively improve ensemble regression performance in medical prediction tasks involving limited datasets.

References

Agrawal, R., Gupta, T., Gupta, S., Chauhan, S., Patel, P., & Hamdare, S. (2025). Fostering trust and interpretability: integrating explainable AI (XAI) with machine learning for enhanced disease prediction and decision transparency. Diagnostic Pathology, 20(1), 105. https://doi.org/10.1186/s13000-025-01686-3

Frasca, M., La Torre, D., Pravettoni, G., & Cutica, I. (2024). Explainable and interpretable artificial intelligence in medicine: a systematic bibliometric review. Discover Artificial Intelligence, 4(1), 15. https://doi.org/10.1007/s44163-024-00114-7

Gao, J., Ren, J., & Wen, Z. (2025). Research on Diabetes Prediction Based on the Randomized Search CV Method. 2025 2nd International Conference on Electronic Engineering and Information Systems (EEISS), 1–4. https://doi.org/10.1109/EEISS65394.2025.11086023

Gill, M., Anderson, R., Hu, H., Bennamoun, M., Petereit, J., Valliyodan, B., Nguyen, H. T., Batley, J., Bayer, P. E., & Edwards, D. (2022). Machine learning models outperform deep learning models, provide interpretation and facilitate feature selection for soybean trait prediction. BMC Plant Biology, 22(1), 180. https://doi.org/10.1186/s12870-022-03559-z

Gupta, A., Stead, T. S., & Ganti, L. (2024). Determining a Meaningful R-squared Value in Clinical Medicine. Academic Medicine & Surgery. https://doi.org/10.62186/001c.125154

Habehh, H., & Gohel, S. (2021). Machine Learning in Healthcare. Current Genomics, 22(4), 291–300. https://doi.org/10.2174/1389202922666210705124359

Havelda, L., Szalai, E. Á., Obeidat, M., Dobszai, D., Veres, D. S., Kói, T., Sipter, E., Váncsa, S., Hegyi, P. J., Bucur, M., Molnár, A., Vámossy, K. L., Hegyi, P., & Szentesi, A. (2025). Hypertriglyceridemia is a dose-dependent risk factor for type 2 diabetes mellitus: a systematic review and meta-analysis. Frontiers in Endocrinology, Volume 16-2025. https://doi.org/10.3389/fendo.2025.1710007

Hossain, M. J., Al-Mamun, M., & Islam, M. R. (2024). Diabetes mellitus, the fastest growing global public health concern: Early detection should be focused. Health Science Reports, 7(3), e2004. https://doi.org/10.1002/hsr2.2004

Imani, M., Beikmohammadi, A., & Arabnia, H. R. (2025). Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels. Technologies, 13(3). https://doi.org/10.3390/technologies13030088

Imans, D., Abuhmed, T., Alharbi, M., & El-Sappagh, S. (2024). Explainable Multi-Layer Dynamic Ensemble Framework Optimized for Depression Detection and Severity Assessment. Diagnostics, 14(21). https://doi.org/10.3390/diagnostics14212385

Karunakaran, C., Niranjan, V., & Setlur, A. S. (2025). Random Forest and XGBoost-based ensemble models for colorectal cancer exome variant classification and web application deployment for early prediction. Computational and Structural Biotechnology Reports, 2, 100063. https://doi.org/https://doi.org/10.1016/j.csbr.2025.100063

Khan, A. A., Chaudhari, O., & Chandra, R. (2024). A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Systems with Applications, 244, 122778. https://doi.org/https://doi.org/10.1016/j.eswa.2023.122778

Mahajan, P., Uddin, S., Hajati, F., & Moni, M. A. (2023). Ensemble Learning for Disease Prediction: A Review. Healthcare, 11(12). https://doi.org/10.3390/healthcare11121808

Mienye, I. D., & Jere, N. (2024). Optimized Ensemble Learning Approach with Explainable AI for Improved Heart Disease Prediction. Information, 15(7). https://doi.org/10.3390/info15070394

Mohammed, A., & Kora, R. (2023). A comprehensive review on ensemble deep learning: Opportunities and challenges. Journal of King Saud University - Computer and Information Sciences, 35(2), 757–774. https://doi.org/https://doi.org/10.1016/j.jksuci.2023.01.014

Ong, K. L., Stafford, L. K., McLaughlin, S. A., Boyko, E. J., Vollset, S. E., Smith, A. E., Dalton, B. E., Duprey, J., Cruz, J. A., Hagins, H., Lindstedt, P. A., Aali, A., Abate, Y. H., Abate, M. D., Abbasian, M., Abbasi-Kangevari, Z., Abbasi-Kangevari, M., ElHafeez, S. A., Abd-Rabu, R., … Vos, T. (2023). Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. The Lancet, 402(10397), 203 – 234. https://doi.org/10.1016/S0140-6736(23)01301-6

Peters, R., Ee, N., Peters, J., Beckett, N., Booth, A., Rockwood, K., & Anstey, K. J. (2019). Common risk factors for major noncommunicable disease, a systematic overview of reviews and commentary: the implied potential for targeted risk reduction. Therapeutic Advances in Chronic Disease, 10, 2040622319880392. https://doi.org/10.1177/2040622319880392

Santos, M. R., Guedes, A., & Sanchez-Gendriz, I. (2024). SHapley Additive exPlanations (SHAP) for Efficient Feature Selection in Rolling Bearing Fault Diagnosis. Machine Learning and Knowledge Extraction, 6(1), 316–341. https://doi.org/10.3390/make6010016

Ukoba, O., Joseph, U. O., Charles, O. L., Opirite, P.-K. B., Omamuyovwi, A. E., & Matilda, O.-N. (2025). A systematic review of machine learning methods for diabetes mellitus prediction and classification in Nigeria. International Journal of Community Medicine and Public Health, 12(6), 2828–2835. https://doi.org/10.18203/2394-6040.ijcmph20251734

Wahidin, M., Achadi, A., Besral, B., Kosen, S., Nadjib, M., Nurwahyuni, A., Ronoatmodjo, S., Rahajeng, E., Pane, M., & Kusuma, D. (2024). Projection of diabetes morbidity and mortality till 2045 in Indonesia based on risk factors and NCD prevention and control programs. Scientific Reports, 14(1), 5424. https://doi.org/10.1038/s41598-024-54563-2

Wondmkun, Y. T. (2020). Obesity, Insulin Resistance, and Type 2 Diabetes: Associations and Therapeutic Implications. Diabetes, Metabolic Syndrome and Obesity: Targets and Therapy, 13, 3611–3616. https://doi.org/10.2147/DMSO.S275898

Wu, J., Chen, X.-Y., Zhang, H., Xiong, L.-D., Lei, H., & Deng, S.-H. (2019). Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimizationb. Journal of Electronic Science and Technology, 17(1), 26–40. https://doi.org/https://doi.org/10.11989/JEST.1674-862X.80904120

Yan, D., Li, X., Wang, Y., & Cai, Z. (2025). Optimized prediction of diabetes complications using ensemble learning with Bayesian optimization: a cost-efficient laboratory-based approach. Frontiers in Endocrinology, Volume 16-2025. https://doi.org/10.3389/fendo.2025.1593068

Zhang, H., Ni, J., Yu, C., Wu, Y., Li, J., Liu, J., Tu, J., Ning, X., He, Q., & Wang, J. (2019). Sex-Based Differences in Diabetes Prevalence and Risk Factors: A Population-Based Cross-Sectional Study Among Low-Income Adults in China. Frontiers in Endocrinology, 10, 658. https://doi.org/10.3389/fendo.2019.00658

Zhong, H., Zhang, H., & Jia, F. (2020). A computing method of predictive value based on fitting function in linear model. EAI Endorsed Transactions on Collaborative Computing, 4(14). https://doi.org/10.4108/eai.2-10-2020.166542

Downloads

Published

2026-05-25

Issue

Section

Articles

How to Cite

Optimasi Hyperparameter Ensemble Learning untuk Prediksi Perkembangan Penyakit Diabetes dengan Explainable AI. (2026). JISKA (Jurnal Informatika Sunan Kalijaga), 11(2), 182-194. https://doi.org/10.14421/jiska.5953

Similar Articles

41-50 of 71

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)