Optimasi Hyperparameter Ensemble Learning untuk Prediksi Perkembangan Penyakit Diabetes dengan Explainable AI
DOI:
https://doi.org/10.14421/jiska.5953Keywords:
Ensemble Learning, Hyperparameter Optimization, Random Forest, XGBoost, Diabetes PredictionAbstract
This research focuses on optimizing and assessing ensemble learning models for predicting diabetes progression by combining hyperparameter tuning and explainable artificial intelligence techniques. Experiments were conducted using the scikit-learn diabetes dataset, which contains 442 samples with ten numerical features representing patients’ clinical conditions. The data were split into 80% for training and 20% for testing. Two ensemble methods were explored: Random Forest Regressor (bagging) and XGBoost Regressor (boosting). Hyperparameter optimization was carried out using RandomizedSearchCV and BayesianSearchCV under a five-fold cross-validation scheme. Model performance was evaluated using MAE, MSE, RMSE, and R² metrics, while interpretability was examined through SHAP summary plots. The results indicate that BayesianSearchCV consistently delivered superior performance gains compared to random search. In particular, the optimized XGBoost model achieved an R² score of 0.5018, improving by 19.8% over the baseline model (R² = 0.4188), and reduced RMSE from 55.49 to 51.37. SHAP analysis showed that serum triglycerides, body mass index, and blood pressure were the most influential features. Overall, the findings suggest that Bayesian-based hyperparameter optimization can effectively improve ensemble regression performance in medical prediction tasks involving limited datasets.
References
Agrawal, R., Gupta, T., Gupta, S., Chauhan, S., Patel, P., & Hamdare, S. (2025). Fostering trust and interpretability: integrating explainable AI (XAI) with machine learning for enhanced disease prediction and decision transparency. Diagnostic Pathology, 20(1), 105. https://doi.org/10.1186/s13000-025-01686-3
Frasca, M., La Torre, D., Pravettoni, G., & Cutica, I. (2024). Explainable and interpretable artificial intelligence in medicine: a systematic bibliometric review. Discover Artificial Intelligence, 4(1), 15. https://doi.org/10.1007/s44163-024-00114-7
Gao, J., Ren, J., & Wen, Z. (2025). Research on Diabetes Prediction Based on the Randomized Search CV Method. 2025 2nd International Conference on Electronic Engineering and Information Systems (EEISS), 1–4. https://doi.org/10.1109/EEISS65394.2025.11086023
Gill, M., Anderson, R., Hu, H., Bennamoun, M., Petereit, J., Valliyodan, B., Nguyen, H. T., Batley, J., Bayer, P. E., & Edwards, D. (2022). Machine learning models outperform deep learning models, provide interpretation and facilitate feature selection for soybean trait prediction. BMC Plant Biology, 22(1), 180. https://doi.org/10.1186/s12870-022-03559-z
Gupta, A., Stead, T. S., & Ganti, L. (2024). Determining a Meaningful R-squared Value in Clinical Medicine. Academic Medicine & Surgery. https://doi.org/10.62186/001c.125154
Habehh, H., & Gohel, S. (2021). Machine Learning in Healthcare. Current Genomics, 22(4), 291–300. https://doi.org/10.2174/1389202922666210705124359
Havelda, L., Szalai, E. Á., Obeidat, M., Dobszai, D., Veres, D. S., Kói, T., Sipter, E., Váncsa, S., Hegyi, P. J., Bucur, M., Molnár, A., Vámossy, K. L., Hegyi, P., & Szentesi, A. (2025). Hypertriglyceridemia is a dose-dependent risk factor for type 2 diabetes mellitus: a systematic review and meta-analysis. Frontiers in Endocrinology, Volume 16-2025. https://doi.org/10.3389/fendo.2025.1710007
Hossain, M. J., Al-Mamun, M., & Islam, M. R. (2024). Diabetes mellitus, the fastest growing global public health concern: Early detection should be focused. Health Science Reports, 7(3), e2004. https://doi.org/10.1002/hsr2.2004
Imani, M., Beikmohammadi, A., & Arabnia, H. R. (2025). Comprehensive Analysis of Random Forest and XGBoost Performance with SMOTE, ADASYN, and GNUS Under Varying Imbalance Levels. Technologies, 13(3). https://doi.org/10.3390/technologies13030088
Imans, D., Abuhmed, T., Alharbi, M., & El-Sappagh, S. (2024). Explainable Multi-Layer Dynamic Ensemble Framework Optimized for Depression Detection and Severity Assessment. Diagnostics, 14(21). https://doi.org/10.3390/diagnostics14212385
Karunakaran, C., Niranjan, V., & Setlur, A. S. (2025). Random Forest and XGBoost-based ensemble models for colorectal cancer exome variant classification and web application deployment for early prediction. Computational and Structural Biotechnology Reports, 2, 100063. https://doi.org/https://doi.org/10.1016/j.csbr.2025.100063
Khan, A. A., Chaudhari, O., & Chandra, R. (2024). A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Systems with Applications, 244, 122778. https://doi.org/https://doi.org/10.1016/j.eswa.2023.122778
Mahajan, P., Uddin, S., Hajati, F., & Moni, M. A. (2023). Ensemble Learning for Disease Prediction: A Review. Healthcare, 11(12). https://doi.org/10.3390/healthcare11121808
Mienye, I. D., & Jere, N. (2024). Optimized Ensemble Learning Approach with Explainable AI for Improved Heart Disease Prediction. Information, 15(7). https://doi.org/10.3390/info15070394
Mohammed, A., & Kora, R. (2023). A comprehensive review on ensemble deep learning: Opportunities and challenges. Journal of King Saud University - Computer and Information Sciences, 35(2), 757–774. https://doi.org/https://doi.org/10.1016/j.jksuci.2023.01.014
Ong, K. L., Stafford, L. K., McLaughlin, S. A., Boyko, E. J., Vollset, S. E., Smith, A. E., Dalton, B. E., Duprey, J., Cruz, J. A., Hagins, H., Lindstedt, P. A., Aali, A., Abate, Y. H., Abate, M. D., Abbasian, M., Abbasi-Kangevari, Z., Abbasi-Kangevari, M., ElHafeez, S. A., Abd-Rabu, R., … Vos, T. (2023). Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. The Lancet, 402(10397), 203 – 234. https://doi.org/10.1016/S0140-6736(23)01301-6
Peters, R., Ee, N., Peters, J., Beckett, N., Booth, A., Rockwood, K., & Anstey, K. J. (2019). Common risk factors for major noncommunicable disease, a systematic overview of reviews and commentary: the implied potential for targeted risk reduction. Therapeutic Advances in Chronic Disease, 10, 2040622319880392. https://doi.org/10.1177/2040622319880392
Santos, M. R., Guedes, A., & Sanchez-Gendriz, I. (2024). SHapley Additive exPlanations (SHAP) for Efficient Feature Selection in Rolling Bearing Fault Diagnosis. Machine Learning and Knowledge Extraction, 6(1), 316–341. https://doi.org/10.3390/make6010016
Ukoba, O., Joseph, U. O., Charles, O. L., Opirite, P.-K. B., Omamuyovwi, A. E., & Matilda, O.-N. (2025). A systematic review of machine learning methods for diabetes mellitus prediction and classification in Nigeria. International Journal of Community Medicine and Public Health, 12(6), 2828–2835. https://doi.org/10.18203/2394-6040.ijcmph20251734
Wahidin, M., Achadi, A., Besral, B., Kosen, S., Nadjib, M., Nurwahyuni, A., Ronoatmodjo, S., Rahajeng, E., Pane, M., & Kusuma, D. (2024). Projection of diabetes morbidity and mortality till 2045 in Indonesia based on risk factors and NCD prevention and control programs. Scientific Reports, 14(1), 5424. https://doi.org/10.1038/s41598-024-54563-2
Wondmkun, Y. T. (2020). Obesity, Insulin Resistance, and Type 2 Diabetes: Associations and Therapeutic Implications. Diabetes, Metabolic Syndrome and Obesity: Targets and Therapy, 13, 3611–3616. https://doi.org/10.2147/DMSO.S275898
Wu, J., Chen, X.-Y., Zhang, H., Xiong, L.-D., Lei, H., & Deng, S.-H. (2019). Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimizationb. Journal of Electronic Science and Technology, 17(1), 26–40. https://doi.org/https://doi.org/10.11989/JEST.1674-862X.80904120
Yan, D., Li, X., Wang, Y., & Cai, Z. (2025). Optimized prediction of diabetes complications using ensemble learning with Bayesian optimization: a cost-efficient laboratory-based approach. Frontiers in Endocrinology, Volume 16-2025. https://doi.org/10.3389/fendo.2025.1593068
Zhang, H., Ni, J., Yu, C., Wu, Y., Li, J., Liu, J., Tu, J., Ning, X., He, Q., & Wang, J. (2019). Sex-Based Differences in Diabetes Prevalence and Risk Factors: A Population-Based Cross-Sectional Study Among Low-Income Adults in China. Frontiers in Endocrinology, 10, 658. https://doi.org/10.3389/fendo.2019.00658
Zhong, H., Zhang, H., & Jia, F. (2020). A computing method of predictive value based on fitting function in linear model. EAI Endorsed Transactions on Collaborative Computing, 4(14). https://doi.org/10.4108/eai.2-10-2020.166542
Downloads
Published
Issue
Section
License
Copyright (c) 2026 David Suharjanto, Muhammad Syafiq Akmal, Nur Fikri Khuluq, Muh Naufal Muzhaffar, Maria Ulfah Siregar

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms as stated in http://creativecommons.org/licenses/by-nc/4.0
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.




