Deteksi Diabetes Mellitus dengan Menggunakan Teknik Ensemble XGBoost dan LightGBM
DOI:
https://doi.org/10.14421/jiska.4908Keywords:
Diabetes Mellitus, Machine Learning, XGBoost, LightGBM, Early DetectionAbstract
Diabetes mellitus is a metabolic disease characterized by elevated blood sugar levels due to impaired insulin secretion, insulin action, or both. The disease has a major impact on public health and contributes to high morbidity and mortality rates in many countries. Prevention and early detection are essential to reduce the adverse effects of this disease. This study aims to analyze and apply machine learning algorithms in detecting diabetes mellitus, focusing on the use of XGBoost and LightGBM algorithms. The dataset used in this study includes various features related to diabetes risk factors, such as age, gender, body mass index (BMI), hypertension, smoking history, and HbA1c and blood glucose levels. Preprocessing was performed to clean and balance the data using the SMOTE-Tomek technique. Next, the model was built and evaluated using the K-Fold cross-validation method to measure the accuracy and stability of the model. The results showed that the XGBoost model achieved 97.31% accuracy, while the LightGBM model produced 97.26% accuracy. Combining the two models through blending techniques resulted in an accuracy of 97.51%, indicating that the combination of models can improve prediction performance. This study shows the great potential of machine learning algorithms, especially XGBoost and LightGBM, in detecting diabetes mellitus accurately and efficiently. Hopefully, the results of this study can contribute to the development of decision support systems for more effective early diagnosis of diabetes.
References
Alam, U., Asghar, O., Azmi, S., & Malik, R. A. (2014). General aspects of diabetes mellitus. In Handbook of Clinical Neurology (Vol. 4, pp. 211–222). https://doi.org/10.1016/B978-0-444-53480-4.00015-1
Azmi, S. S., & Baliga, S. (2020). An Overview of Boosting Decision Tree Algorithms utilizing AdaBoost and XGBoost Boosting strategies. International Research Journal of Engineering and Technology, 7(5), 6867–6870. https://www.irjet.net/archives/V7/i5/IRJET-V7I51293.pdf
Butt, U. M., Letchmunan, S., Ali, M., Hassan, F. H., Baqir, A., & Sherazi, H. H. R. (2021). Machine Learning Based Diabetes Classification and Prediction for Healthcare Applications. Journal of Healthcare Engineering, 2021, 1–17. https://doi.org/10.1155/2021/9930985
Chang, V., Bailey, J., Xu, Q. A., & Sun, Z. (2023). Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Computing and Applications, 35(22), 16157–16173. https://doi.org/10.1007/s00521-022-07049-z
Fareed, M. M. S., Zikria, S., Ahmed, G., Mui-Zzud-Din, Mahmood, S., Aslam, M., Jillani, S. F., Moustafa, A., & Asad, M. (2022). ADD-Net: An Effective Deep Learning Model for Early Detection of Alzheimer Disease in MRI Scans. IEEE Access, 10, 96930–96951. https://doi.org/10.1109/ACCESS.2022.3204395
Galicia-garcia, U., Benito-vicente, A., Jebari, S., & Larrea-sebal, A. (2020). Costus ignus: Insulin plant and it’s preparations as remedial approach for diabetes mellitus. International Journal of Molecular Sciences, 1–34. https://doi.org/10.13040/IJPSR.0975-8232.13(4).1551-58
Gomes, H. M., Barddal, J. P., Enembreck, F., & Bifet, A. (2018). A Survey on Ensemble Learning for Data Stream Classification. ACM Computing Surveys, 50(2), 1–36. https://doi.org/10.1145/3054925
Kahloot, K. M., & Ekler, P. (2021). Algorithmic Splitting: A Method for Dataset Preparation. IEEE Access, 9, 125229–125237. https://doi.org/10.1109/ACCESS.2021.3110745
Kharis, S. A. A., & Zili, A. H. A. (2022). Learning Analytics dan Educational Data Mining pada Data Pendidikan. JURNAL RISET PEMBELAJARAN MATEMATIKA SEKOLAH, 6(1), 12–20. https://doi.org/10.21009/jrpms.061.02
Kumar, M., Singhal, S., Shekhar, S., Sharma, B., & Srivastava, G. (2022). Optimized Stacking Ensemble Learning Model for Breast Cancer Detection and Classification Using Machine Learning. Sustainability, 14(21), Article ID: 13998. https://doi.org/10.3390/su142113998
Lai, H., Huang, H., Keshavjee, K., Guergachi, A., & Gao, X. (2019). Predictive models for diabetes mellitus using machine learning techniques. BMC Endocrine Disorders, 19(1), Article ID: 101. https://doi.org/10.1186/s12902-019-0436-6
Machado, M. R., Karray, S., & de Sousa, I. T. (2019). LightGBM: an Effective Decision Tree Gradient Boosting Method to Predict Customer Loyalty in the Finance Industry. 2019 14th International Conference on Computer Science & Education (ICCSE), 1111–1116. https://doi.org/10.1109/ICCSE.2019.8845529
Manconi, A., Armano, G., Gnocchi, M., & Milanesi, L. (2022). A Soft-Voting Ensemble Classifier for Detecting Patients Affected by COVID-19. Applied Sciences, 12(15), Article ID: 7554. https://doi.org/10.3390/app12157554
Mengcan, M., Xiaofang, C., & Yongfang, X. (2021). Constrained voting extreme learning machine and its application. Journal of Systems Engineering and Electronics, 32(1), 209–219. https://doi.org/10.23919/JSEE.2021.000018
Mujumdar, A., & Vaidehi, V. (2019). Diabetes Prediction using Machine Learning Algorithms. Procedia Computer Science, 165, 292–299. https://doi.org/10.1016/j.procs.2020.01.047
Muljono, Wulandari, S. A., Azies, H. Al, Naufal, M., Prasetyanto, W. A., & Zahra, F. A. (2024). Breaking Boundaries in Diagnosis: Non-Invasive Anemia Detection Empowered by AI. IEEE Access, 12(2023), 9292–9307. https://doi.org/10.1109/ACCESS.2024.3353788
Ogurtsova, K., da Rocha Fernandes, J. D., Huang, Y., Linnenkamp, U., Guariguata, L., Cho, N. H., Cavan, D., Shaw, J. E., & Makaroff, L. E. (2017). IDF Diabetes Atlas: Global estimates for the prevalence of diabetes for 2015 and 2040. Diabetes Research and Clinical Practice, 128, 40–50. https://doi.org/10.1016/j.diabres.2017.03.024
Rif’at, I. D., Hasneli N, Y., & Indriati, G. (2023). GAMBARAN KOMPLIKASI DIABETES MELITUS PADA PENDERITA DIABETES MELITUS. Jurnal Keperawatan Profesional, 11(1), 52–69. https://doi.org/10.33650/jkp.v11i1.5540
Sari, L., Romadloni, A., Lityaningrum, R., & Hastuti, H. D. (2023). Implementation of LightGBM and Random Forest in Potential Customer Classification. TIERS Information Technology Journal, 4(1), 43–55. https://doi.org/10.38043/tiers.v4i1.4355
Saxena, R., Sharma, S. K., Gupta, M., & Sampada, G. C. (2022). A Novel Approach for Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods. Computational Intelligence and Neuroscience, 2022(2), 1–11. https://doi.org/10.1155/2022/3820360
Sepbriant, G. D., & Utomo, D. W. (2024). Ensemble Learning pada Kategorisasi Produk E-Commerce Menggunakan Teknik Boosting. JISKA (Jurnal Informatika Sunan Kalijaga), 9(2), 123–133. https://doi.org/10.14421/jiska.2024.9.2.123-133
Tanwar, A., & Bhatia, P. K. (2024). A Review on Diabetes Prediction Using Machine Learning Techniques. In Lecture Notes in Electrical Engineering (Vol. 1185, Issue 09, pp. 513–524). https://doi.org/10.1007/978-981-97-1682-1_41
Thohari, A. N. A., Karima, A., Santoso, K., & Rahmawati, R. (2024). Crack Detection in Building Through Deep Learning Feature Extraction and Machine Learning Approch. Journal of Applied Informatics and Computing, 8(1), 1–6. https://doi.org/10.30871/jaic.v8i1.7431
Wang, Z., Wu, C., Zheng, K., Niu, X., & Wang, X. (2019). SMOTETomek-Based Resampling for Personality Recognition. IEEE Access, 7, 129678–129689. https://doi.org/10.1109/ACCESS.2019.2940061
Zhang, H., Liu, C., Zhang, Z., Xing, Y., Liu, X., Dong, R., He, Y., Xia, L., & Liu, F. (2021). Recurrence Plot-Based Approach for Cardiac Arrhythmia Classification Using Inception-ResNet-v2. Frontiers in Physiology, 12, 1–13. https://doi.org/10.3389/fphys.2021.648950
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Naufal Adhi Pratama, Danang Wahyu Utomo

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms as stated in http://creativecommons.org/licenses/by-nc/4.0
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.




