Extreme Gradient Boosting Model with SMOTE for Heart Disease Classification
DOI:
https://doi.org/10.14421/jiska.2025.10.1.48-62Keywords:
Heart Disease, Classification, SMOTE, XGBoostAbstract
Heart disease is one of the leading causes of death worldwide. According to data from the World Health Organisation (WHO), the number of victims who die from heart disease reaches 17.5 million people every year. However, the method of diagnosing heart disease in patients is still not optimal in determining the right treatment. Along with the development of technology, various models of machine learning algorithms and data processing techniques have been developed to find models that can produce the best precision in classifying heart disease. machine learning algorithm model in classifying heart disease, so that it can improve the effectiveness of diagnosis and help in determining the right treatment for patients. This research also aims to overcome the limitations of accuracy in existing diagnosis methods by identifying models that are capable of providing the best results in processing and analysing health data, especially in terms of heart disease classification. In this study, the XGBoost model was identified as the most superior, with an accuracy of 99%. These results show that the XGBoost model has a higher accuracy rate compared to previous methods, making it a promising solution to improve the accuracy of heart disease diagnosis and classification in the future.
References
Ammar, A., Bouattane, O., & Youssfi, M. (2021). Automatic cardiac cine MRI segmentation and heart disease classification. Computerized Medical Imaging and Graphics, 88(July 2020), 101864. https://doi.org/10.1016/j.compmedimag.2021.101864
Anshori, M., & Haris, M. S. (2022). Predicting Heart Disease using Logistic Regression _ Anshori _ Knowledge Engineering and Data Science.pdf (hal. 10).
Ashtaiwi, A. A., Khalifa, T., & Alirr, O. (2024). Enhancing heart disease diagnosis through ECG image vectorization-based classification. Heliyon, 10(18), e37574. https://doi.org/10.1016/j.heliyon.2024.e37574
Baccouche, A., Garcia-Zapirain, B., Olea, C. C., & Elmaghraby, A. (2020). Ensemble deep learning models for heart disease classification: A case study from Mexico. Information (Switzerland), 11(4), 1–28. https://doi.org/10.3390/INFO11040207
Bengesi, S., Oladunni, T., Olusegun, R., & Audu, H. (2023). A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets. IEEE Access, 11(January), 11811–11826. https://doi.org/10.1109/ACCESS.2023.3242290
Benhar, H., Idri, A., & L Fernández-Alemán, J. (2020). Data preprocessing for heart disease classification: A systematic literature review. Computer Methods and Programs in Biomedicine, 195. https://doi.org/10.1016/j.cmpb.2020.105635
Chen, L., Ji, P., & Ma, Y. (2022). Machine Learning Model for Hepatitis C Diagnosis Customized to Each Patient. IEEE Access, 10(October), 106655–106672. https://doi.org/10.1109/ACCESS.2022.3210347
El-Sofany, H. F. (2024). Predicting Heart Diseases Using Machine Learning and Different Data Classification Techniques. IEEE Access, 12(August), 106146–106160. https://doi.org/10.1109/ACCESS.2024.3437181
Gárate-Escamila, A. K., Hajjam El Hassani, A., & Andrès, E. (2020). Classification models for heart disease prediction using feature selection and PCA. Informatics in Medicine Unlocked, 19. https://doi.org/10.1016/j.imu.2020.100330
Gibson, S., Issac, B., Zhang, L., & Jacob, S. M. (2020). Detecting spam email with machine learning optimized with bio-inspired metaheuristic algorithms. IEEE Access, 8, 187914–187932. https://doi.org/10.1109/ACCESS.2020.3030751
Haznedar, B., & Simsek, N. Y. (2022). A Comparative Study on Classification Methods for Renal Cell and Lung Cancers Using RNA-Seq Data. IEEE Access, 10(October), 105412–105420. https://doi.org/10.1109/ACCESS.2022.3211505
Hossain, M. I., Maruf, M. H., Khan, M. A. R., Prity, F. S., Fatema, S., Ejaz, M. S., & Khan, M. A. S. (2023). Heart disease prediction using distinct artificial intelligence techniques: performance analysis and comparison. Iran Journal of Computer Science, 6(4), 397–417. https://doi.org/10.1007/s42044-023-00148-7
Huang, Z., & Chen, D. (2022). A Breast Cancer Diagnosis Method Based on VIM Feature Selection and Hierarchical Clustering Random Forest Algorithm. IEEE Access, 10, 3284–3293. https://doi.org/10.1109/ACCESS.2021.3139595
Islam, N., Fatema-Tuj-Jahra, M., Hasan, M. T., & Farid, D. M. (2023). KNNTree: A New Method to Ameliorate K-Nearest Neighbour Classification using Decision Tree. 3rd International Conference on Electrical, Computer and Communication Engineering, ECCE 2023, 1–6. https://doi.org/10.1109/ECCE57851.2023.10101569
jabbar, M. A., Deekshatulu, B. L., & Chandra, P. (2013). Classification of Heart Disease Using K- Nearest Neighbor and Genetic Algorithm. Procedia Technology, 10, 85–94. https://doi.org/10.1016/j.protcy.2013.12.340
Li, J. P., Haq, A. U., Din, S. U., Khan, J., Khan, A., & Saboor, A. (2020). Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare. IEEE Access, 8(Ml), 107562–107582. https://doi.org/10.1109/ACCESS.2020.3001149
Li, M., Ma, X., Chen, C., Yuan, Y., Zhang, S., Yan, Z., Chen, C., Chen, F., Bai, Y., Zhou, P., Lv, X., & Ma, M. (2021). Research on the Auxiliary Classification and Diagnosis of Lung Cancer Subtypes Based on Histopathological Images. IEEE Access, 9, 53687–53707. https://doi.org/10.1109/ACCESS.2021.3071057
Maity, A., Pathak, A., & Saha, G. (2023). Transfer learning based heart valve disease classification from Phonocardiogram signal. Biomedical Signal Processing and Control, 85(August 2022), 104805. https://doi.org/10.1016/j.bspc.2023.104805
Mamun, M., Farjana, A., Al Mamun, M., & Ahammed, M. S. (2022). Lung cancer prediction model using ensemble learning techniques and a systematic review analysis. 2022 IEEE World AI IoT Congress, AIIoT 2022, 187–193. https://doi.org/10.1109/AIIoT54504.2022.9817326
Manikandan, G., Pragadeesh, B., Manojkumar, V., Karthikeyan, A. L., & Manikandan, R. (2024). Informatics in Medicine Unlocked Classification models combined with Boruta feature selection for heart disease prediction. Informatics in Medicine Unlocked, 44(December 2023), 101442. https://doi.org/10.1016/j.imu.2023.101442
Matin Malakouti, S. (2023). Heart disease classification based on ECG using machine learning models. Biomedical Signal Processing and Control, 84(August 2022), 104796. https://doi.org/10.1016/j.bspc.2023.104796
Muslim, M. A., Nikmah, T. L., Pertiwi, D. A. A., Subhan, Jumanto, Dasril, Y., & Iswanto. (2023). New model combination meta-learner to improve accuracy prediction P2P lending with stacking ensemble learning. Intelligent Systems with Applications, 18(December 2022), 200204. https://doi.org/10.1016/j.iswa.2023.200204
Ningsih, M. R. (2024). Classification Email Spam using Naive Bayes Algorithm and Chi-Squared Feature Selection. 9(1), 74–87.
Obiedat, R., Qaddoura, R., Al-Zoubi, A. M., Al-Qaisi, L., Harfoushi, O., Alrefai, M., & Faris, H. (2022). Sentiment Analysis of Customers’ Reviews Using a Hybrid Evolutionary SVM-Based Approach in an Imbalanced Data Distribution. IEEE Access, 10, 22260–22273. https://doi.org/10.1109/ACCESS.2022.3149482
Oh, H. (2021). A YouTube Spam Comments Detection Scheme Using Cascaded Ensemble Machine Learning Model. IEEE Access, 9, 144121–144128. https://doi.org/10.1109/ACCESS.2021.3121508
Pan, Y., Fu, M., Cheng, B., Tao, X., & Guo, J. (2020). Enhanced deep learning assisted convolutional neural network for heart disease prediction on the internet of medical things platform. IEEE Access, 8, 189503–189512. https://doi.org/10.1109/ACCESS.2020.3026214
Patidar, S., Kumar, D., & Rukwal, D. (2022). Comparative Analysis of Machine Learning Algorithms for Heart Disease Prediction. 64–69. https://doi.org/10.3233/ATDE220723
Radhika, R., & Thomas George, S. (2021). Heart Disease Classification Using Machine Learning Techniques. Journal of Physics: Conference Series, 1937(1), 1137–1144. https://doi.org/10.1088/1742-6596/1937/1/012047
Rofik, R., Hakim, R. A., Unjung, J., Prasetiyo, B., & Muslim, M. A. (2024). Optimization of SVM and Gradient Boosting Models Using GridSearchCV in Detecting Fake Job Postings. MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, 23(2), 419–430. https://doi.org/10.30812/matrik.v23i2.3566
S. Maghdid, S. A. R. T. (2022). An Extensive Dataset for the Heart Disease Classification System . Mendeley Data, V2.
Sridhar, S., & Sanagavarapu, S. (2021). Handling Data Imbalance in Predictive Maintenance for Machines using SMOTE-based Oversampling. Proceedings - 2021 IEEE 13th International Conference on Computational Intelligence and Communication Networks, CICN 2021, 44–49. https://doi.org/10.1109/CICN51697.2021.9574668
Subathra, R., & Sumathy, V. (2024). An offbeat bolstered swarm integrated ensemble learning (BSEL) model for heart disease diagnosis and classification. Applied Soft Computing, 154(August 2023), 111273. https://doi.org/10.1016/j.asoc.2024.111273
Wazrah, A. Al, & Alhumoud, S. (2021). Sentiment Analysis Using Stacked Gated Recurrent Unit for Arabic Tweets. IEEE Access, 9, 137176–137187. https://doi.org/10.1109/ACCESS.2021.3114313
Xu, W., Yu, K., Ye, J., Li, H., Chen, J., Yin, F., Xu, J., Zhu, J., Li, D., & Shu, Q. (2022). Automatic pediatric congenital heart disease classification based on heart sound signal. Artificial Intelligence in Medicine, 126(December 2021), 102257. https://doi.org/10.1016/j.artmed.2022.102257
Zhang, D., & Gong, Y. (2020). The Comparison of LightGBM and XGBoost Coupling Factor Analysis and Prediagnosis of Acute Liver Failure. IEEE Access, 8, 220990–221003. https://doi.org/10.1109/ACCESS.2020.3042848
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Ahmad Ubai Dullah, Aditya Yoga Darmawan, Dwika Ananda Agustina Pertiwi, Jumanto Unjung

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms as stated in http://creativecommons.org/licenses/by-nc/4.0
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.