Extreme Gradient Boosting Model with SMOTE for Heart Disease Classification

Authors

  • Ahmad Ubai Dullah Department of Computer Science, Universitas Negeri Semarang, Indonesia
  • Aditya Yoga Darmawan Department of Computer Science, Universitas Negeri Semarang, Indonesia
  • Dwika Ananda Agustina Pertiwi Faculty of Technology Management and Business, Universiti Tun Hussein Onn Malaysia, Malaysia
  • Jumanto Unjung Department of Computer Science, Universitas Negeri Semarang, Indonesia

DOI:

https://doi.org/10.14421/jiska.2025.10.1.48-62

Keywords:

Heart Disease, Classification, SMOTE, XGBoost

Abstract

Heart disease is one of the leading causes of death worldwide. According to data from the World Health Organisation (WHO), the number of victims who die from heart disease reaches 17.5 million people every year. However, the method of diagnosing heart disease in patients is still not optimal in determining the right treatment. Along with the development of technology, various models of machine learning algorithms and data processing techniques have been developed to find models that can produce the best precision in classifying heart disease. machine learning algorithm model in classifying heart disease, so that it can improve the effectiveness of diagnosis and help in determining the right treatment for patients. This research also aims to overcome the limitations of accuracy in existing diagnosis methods by identifying models that are capable of providing the best results in processing and analysing health data, especially in terms of heart disease classification. In this study, the XGBoost model was identified as the most superior, with an accuracy of 99%. These results show that the XGBoost model has a higher accuracy rate compared to previous methods, making it a promising solution to improve the accuracy of heart disease diagnosis and classification in the future.

References

Ammar, A., Bouattane, O., & Youssfi, M. (2021). Automatic cardiac cine MRI segmentation and heart disease classification. Computerized Medical Imaging and Graphics, 88(July 2020), 101864. https://doi.org/10.1016/j.compmedimag.2021.101864

Anshori, M., & Haris, M. S. (2022). Predicting Heart Disease using Logistic Regression _ Anshori _ Knowledge Engineering and Data Science.pdf (hal. 10).

Ashtaiwi, A. A., Khalifa, T., & Alirr, O. (2024). Enhancing heart disease diagnosis through ECG image vectorization-based classification. Heliyon, 10(18), e37574. https://doi.org/10.1016/j.heliyon.2024.e37574

Baccouche, A., Garcia-Zapirain, B., Olea, C. C., & Elmaghraby, A. (2020). Ensemble deep learning models for heart disease classification: A case study from Mexico. Information (Switzerland), 11(4), 1–28. https://doi.org/10.3390/INFO11040207

Bengesi, S., Oladunni, T., Olusegun, R., & Audu, H. (2023). A Machine Learning-Sentiment Analysis on Monkeypox Outbreak: An Extensive Dataset to Show the Polarity of Public Opinion From Twitter Tweets. IEEE Access, 11(January), 11811–11826. https://doi.org/10.1109/ACCESS.2023.3242290

Benhar, H., Idri, A., & L Fernández-Alemán, J. (2020). Data preprocessing for heart disease classification: A systematic literature review. Computer Methods and Programs in Biomedicine, 195. https://doi.org/10.1016/j.cmpb.2020.105635

Chen, L., Ji, P., & Ma, Y. (2022). Machine Learning Model for Hepatitis C Diagnosis Customized to Each Patient. IEEE Access, 10(October), 106655–106672. https://doi.org/10.1109/ACCESS.2022.3210347

El-Sofany, H. F. (2024). Predicting Heart Diseases Using Machine Learning and Different Data Classification Techniques. IEEE Access, 12(August), 106146–106160. https://doi.org/10.1109/ACCESS.2024.3437181

Gárate-Escamila, A. K., Hajjam El Hassani, A., & Andrès, E. (2020). Classification models for heart disease prediction using feature selection and PCA. Informatics in Medicine Unlocked, 19. https://doi.org/10.1016/j.imu.2020.100330

Gibson, S., Issac, B., Zhang, L., & Jacob, S. M. (2020). Detecting spam email with machine learning optimized with bio-inspired metaheuristic algorithms. IEEE Access, 8, 187914–187932. https://doi.org/10.1109/ACCESS.2020.3030751

Haznedar, B., & Simsek, N. Y. (2022). A Comparative Study on Classification Methods for Renal Cell and Lung Cancers Using RNA-Seq Data. IEEE Access, 10(October), 105412–105420. https://doi.org/10.1109/ACCESS.2022.3211505

Hossain, M. I., Maruf, M. H., Khan, M. A. R., Prity, F. S., Fatema, S., Ejaz, M. S., & Khan, M. A. S. (2023). Heart disease prediction using distinct artificial intelligence techniques: performance analysis and comparison. Iran Journal of Computer Science, 6(4), 397–417. https://doi.org/10.1007/s42044-023-00148-7

Huang, Z., & Chen, D. (2022). A Breast Cancer Diagnosis Method Based on VIM Feature Selection and Hierarchical Clustering Random Forest Algorithm. IEEE Access, 10, 3284–3293. https://doi.org/10.1109/ACCESS.2021.3139595

Islam, N., Fatema-Tuj-Jahra, M., Hasan, M. T., & Farid, D. M. (2023). KNNTree: A New Method to Ameliorate K-Nearest Neighbour Classification using Decision Tree. 3rd International Conference on Electrical, Computer and Communication Engineering, ECCE 2023, 1–6. https://doi.org/10.1109/ECCE57851.2023.10101569

jabbar, M. A., Deekshatulu, B. L., & Chandra, P. (2013). Classification of Heart Disease Using K- Nearest Neighbor and Genetic Algorithm. Procedia Technology, 10, 85–94. https://doi.org/10.1016/j.protcy.2013.12.340

Li, J. P., Haq, A. U., Din, S. U., Khan, J., Khan, A., & Saboor, A. (2020). Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare. IEEE Access, 8(Ml), 107562–107582. https://doi.org/10.1109/ACCESS.2020.3001149

Li, M., Ma, X., Chen, C., Yuan, Y., Zhang, S., Yan, Z., Chen, C., Chen, F., Bai, Y., Zhou, P., Lv, X., & Ma, M. (2021). Research on the Auxiliary Classification and Diagnosis of Lung Cancer Subtypes Based on Histopathological Images. IEEE Access, 9, 53687–53707. https://doi.org/10.1109/ACCESS.2021.3071057

Maity, A., Pathak, A., & Saha, G. (2023). Transfer learning based heart valve disease classification from Phonocardiogram signal. Biomedical Signal Processing and Control, 85(August 2022), 104805. https://doi.org/10.1016/j.bspc.2023.104805

Mamun, M., Farjana, A., Al Mamun, M., & Ahammed, M. S. (2022). Lung cancer prediction model using ensemble learning techniques and a systematic review analysis. 2022 IEEE World AI IoT Congress, AIIoT 2022, 187–193. https://doi.org/10.1109/AIIoT54504.2022.9817326

Manikandan, G., Pragadeesh, B., Manojkumar, V., Karthikeyan, A. L., & Manikandan, R. (2024). Informatics in Medicine Unlocked Classification models combined with Boruta feature selection for heart disease prediction. Informatics in Medicine Unlocked, 44(December 2023), 101442. https://doi.org/10.1016/j.imu.2023.101442

Matin Malakouti, S. (2023). Heart disease classification based on ECG using machine learning models. Biomedical Signal Processing and Control, 84(August 2022), 104796. https://doi.org/10.1016/j.bspc.2023.104796

Muslim, M. A., Nikmah, T. L., Pertiwi, D. A. A., Subhan, Jumanto, Dasril, Y., & Iswanto. (2023). New model combination meta-learner to improve accuracy prediction P2P lending with stacking ensemble learning. Intelligent Systems with Applications, 18(December 2022), 200204. https://doi.org/10.1016/j.iswa.2023.200204

Ningsih, M. R. (2024). Classification Email Spam using Naive Bayes Algorithm and Chi-Squared Feature Selection. 9(1), 74–87.

Obiedat, R., Qaddoura, R., Al-Zoubi, A. M., Al-Qaisi, L., Harfoushi, O., Alrefai, M., & Faris, H. (2022). Sentiment Analysis of Customers’ Reviews Using a Hybrid Evolutionary SVM-Based Approach in an Imbalanced Data Distribution. IEEE Access, 10, 22260–22273. https://doi.org/10.1109/ACCESS.2022.3149482

Oh, H. (2021). A YouTube Spam Comments Detection Scheme Using Cascaded Ensemble Machine Learning Model. IEEE Access, 9, 144121–144128. https://doi.org/10.1109/ACCESS.2021.3121508

Pan, Y., Fu, M., Cheng, B., Tao, X., & Guo, J. (2020). Enhanced deep learning assisted convolutional neural network for heart disease prediction on the internet of medical things platform. IEEE Access, 8, 189503–189512. https://doi.org/10.1109/ACCESS.2020.3026214

Patidar, S., Kumar, D., & Rukwal, D. (2022). Comparative Analysis of Machine Learning Algorithms for Heart Disease Prediction. 64–69. https://doi.org/10.3233/ATDE220723

Radhika, R., & Thomas George, S. (2021). Heart Disease Classification Using Machine Learning Techniques. Journal of Physics: Conference Series, 1937(1), 1137–1144. https://doi.org/10.1088/1742-6596/1937/1/012047

Rofik, R., Hakim, R. A., Unjung, J., Prasetiyo, B., & Muslim, M. A. (2024). Optimization of SVM and Gradient Boosting Models Using GridSearchCV in Detecting Fake Job Postings. MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, 23(2), 419–430. https://doi.org/10.30812/matrik.v23i2.3566

S. Maghdid, S. A. R. T. (2022). An Extensive Dataset for the Heart Disease Classification System . Mendeley Data, V2.

Sridhar, S., & Sanagavarapu, S. (2021). Handling Data Imbalance in Predictive Maintenance for Machines using SMOTE-based Oversampling. Proceedings - 2021 IEEE 13th International Conference on Computational Intelligence and Communication Networks, CICN 2021, 44–49. https://doi.org/10.1109/CICN51697.2021.9574668

Subathra, R., & Sumathy, V. (2024). An offbeat bolstered swarm integrated ensemble learning (BSEL) model for heart disease diagnosis and classification. Applied Soft Computing, 154(August 2023), 111273. https://doi.org/10.1016/j.asoc.2024.111273

Wazrah, A. Al, & Alhumoud, S. (2021). Sentiment Analysis Using Stacked Gated Recurrent Unit for Arabic Tweets. IEEE Access, 9, 137176–137187. https://doi.org/10.1109/ACCESS.2021.3114313

Xu, W., Yu, K., Ye, J., Li, H., Chen, J., Yin, F., Xu, J., Zhu, J., Li, D., & Shu, Q. (2022). Automatic pediatric congenital heart disease classification based on heart sound signal. Artificial Intelligence in Medicine, 126(December 2021), 102257. https://doi.org/10.1016/j.artmed.2022.102257

Zhang, D., & Gong, Y. (2020). The Comparison of LightGBM and XGBoost Coupling Factor Analysis and Prediagnosis of Acute Liver Failure. IEEE Access, 8, 220990–221003. https://doi.org/10.1109/ACCESS.2020.3042848

Downloads

Published

2025-01-31

How to Cite

Dullah, A. U., Darmawan, A. Y., Pertiwi, D. A. A., & Unjung, J. (2025). Extreme Gradient Boosting Model with SMOTE for Heart Disease Classification. JISKA (Jurnal Informatika Sunan Kalijaga), 10(1), 48–62. https://doi.org/10.14421/jiska.2025.10.1.48-62