Comparative Analysis of The Combination of Metaheuristic and Machine Learning Algorithms
pdf

Keywords

complex_dataset
diabetes_prediction
disease_detection
feature_selection
prediction_accuracy

How to Cite

Comparative Analysis of The Combination of Metaheuristic and Machine Learning Algorithms. (2026). IJID (International Journal on Informatics for Development). https://doi.org/10.14421/ijid.2026.4888

Abstract

Diabetes affects about 1.9% of the global population, mainly through Type 2 diabetes. Machine learning (ML) serves a pivotal role in enhancing diabetes prediction by analyzing complex datasets. Feature selection, a crucial ML pre-processing step, improved prediction accuracy by identifying relevant data and discarding irrelevant features. This study investigates the combination of metaheuristic algorithms and ML techniques to enhance diabetes prediction accuracy and computational efficiency. Utilizing the PIMA, Early Stage, and Vanderbilt datasets, experiments evaluated ten algorithm-model combinations based on metrics like accuracy, precision, the Wilcoxon test, and convergence curves. Key findings included that Firefly Algorithm-Logistic Regression, Bat Algorithm-Logistic Regression, and Cuckoo Search-Logistic Regression achieved 74.72% accuracy on PIMA; Firefly Algorithm-Support Vector Machine and Cuckoo Search-Naïve Bayes achieved 83.39% accuracy and 96.15% precision on Early Stage; and Firefly Algorithm-Naïve Bayes achieved 92.88% accuracy and precision on Vanderbilt. These results highlighted the potential of integrating metaheuristics with ML methods to improve clinical diagnostics. Future research is recommended to validate algorithm robustness across diverse datasets to further optimize diabetes prediction strategies.

pdf

References

[1] L. Ryden, G. Ferrannini, and E. Standl, “Risk prediction in patients with diabetes: is SCORE 2D the perfect solution?,” Jul. 21, 2023, Oxford University Press. doi: 10.1093/eurheartj/ehad263.

[2] A. Ahdiat, “Jumlah Penderita Diabetes Tipe 1 di ASEAN Berdasarkan Kelompok Usia (2022),” databoks. Accessed: Jul. 05, 2024. [Online]. Available: https://databoks.katadata.co.id/datapublish/2023/02/10/indonesia-punya-penderita-diabetes-tipe-1-terbanyak-di-asean

[3] McKinsey Digital, “Technology Trends Outlook 2023,” 2023.

[4] P. Solanki, D. Baldaniya, D. Jogani, B. Chaudhary, M. Shah, and A. Kshirsagar, “Artificial intelligence: New age of transformation in petroleum upstream,” Feb. 01, 2022, KeAi Publishing Communications Ltd. doi: 10.1016/j.ptlrs.2021.07.002.

[5] R. Liu, Y. Rong, and Z. Peng, “A review of medical artificial intelligence,” Jun. 01, 2020, KeAi Communications Co. doi: 10.1016/j.glohj.2020.04.002.

[6] R. B. Lukmanto, Suharjito, A. Nugroho, and H. Akbar, “Early detection of diabetes mellitus using feature selection and fuzzy support vector machine,” in Procedia Computer Science, Elsevier B.V., 2019, pp. 46–54. doi: 10.1016/j.procs.2019.08.140.

[7] A. A. S. P. R. A. S. Roihan, “Pemanfaatan Machine Learning dalam Berbagai Bidang: Review paper,” 2019.

[8] O. O. P. A. O. H. F. S. Shilan Hamed, “Filter-Wrapper Combination and Embedded Feature Selection for Gene Expression Data,” Int. J. Advance Soft Compu, 2021.

[9] F. Navazi, Y. Yuan, and N. Archer, “An examination of the hybrid meta-heuristic machine learning algorithms for early diagnosis of type II diabetes using big data feature selection,” Healthcare Analytics, vol. 4, Dec. 2023, doi: 10.1016/j.health.2023.100227.

[10] C. Bielza and P. Larrañaga, Data-Driven Computational Neuroscience. Cambridge University Press, 2020. doi: 10.1017/9781108642989.

[11] P. Agrawal, H. F. Abutarboush, T. Ganesh, and A. W. Mohamed, “Metaheuristic algorithms on feature selection: A survey of one decade of research (2009-2019),” IEEE Access, vol. 9, pp. 26766–26791, 2021, doi: 10.1109/ACCESS.2021.3056407.

[12] T. M. Le, T. M. Vo, T. N. Pham, and S. V. T. Dao, “A Novel Wrapper-Based Feature Selection for Early Diabetes Prediction Enhanced with a Metaheuristic,” IEEE Access, vol. 9, pp. 7869–7884, 2021, doi: 10.1109/ACCESS.2020.3047942.

[13] Herlambang Dwi Prasetyo, Pandu Ananto Hogantara, and Ika Nurlaili Isnainiyah, “A Web-Based Diabetes Prediction Application Using XGBoost Algorithm,” Data Science: Journal of Computing and Applied Informatics, vol. 5, no. 2, pp. 49–59, Jul. 2021, doi: 10.32734/jocai.v5.i2-6290.

[14] L. W. Astuti, I. Saluza, E. Yulianti, and D. Dhamayanti, “Feature Selection Menggunakan Binary Wheal Optimizaton Algorithm (BWOA) pada Klasifikasi Penyakit Diabetes,” Jurnal Ilmiah Informatika Global, vol. 13, no. 1, Mar. 2022, doi: 10.36982/jiig.v13i1.2057.

[15] S. Shankar and Manikandan, “Diagnosis of diabetes diseases using optimized fuzzy rule set by grey wolf optimization,” Pattern Recognit. Lett., vol. 125, pp. 432–438, Jul. 2019, doi: 10.1016/j.patrec.2019.06.005.

[16] UC Irvine Machine Learning Repository, “Pima Indians Diabetes Database,” Kaggle. Accessed: Apr. 25, 2024. [Online]. Available: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database

[17] UC Irvine Machine Learning Repository, “Early Stage Diabetes Risk Prediction,” archive.ics.uci.edu. Accessed: Apr. 25, 2024. [Online]. Available: https://archive.ics.uci.edu/dataset/529/early+stage+diabetes+risk+prediction+dataset

[18] R. Hoyt, “Type 2 Diabetes.” [Online]. Available: https://figshare.com/articles/dataset/Type_2_Diabetes/8011535

[19] O. Tarkhaneh, T. T. Nguyen, and S. Mazaheri, “A novel wrapper-based feature subset selection method using modified binary differential evolution algorithm,” Inf. Sci. (N Y)., vol. 565, pp. 278–305, Jul. 2021, doi: 10.1016/j.ins.2021.02.061.

[20] N. Bacanin, K. Venkatachalam, T. Bezdan, M. Zivkovic, and M. Abouhawwash, “A novel firefly algorithm approach for efficient feature selection with COVID-19 dataset,” Microprocess. Microsyst., vol. 98, Apr. 2023, doi: 10.1016/j.micpro.2023.104778.

[21] X.-S. Yang, “Flower Pollination Algorithm for Global Optimization,” in Unconventional Computation and Natural Computation, N. Durand-Lose Jérôme and Jonoska, Ed., Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 240–249.

[22] F. Jia, S. Luo, G. Yin, and Y. Ye, “A Novel Variant of the Salp Swarm Algorithm for Engineering Optimization,” Journal of Artificial Intelligence and Soft Computing Research, vol. 13, no. 3, pp. 131–149, Jun. 2023, doi: 10.2478/jaiscr-2023-0011.

[23] H. Das, B. Naik, and H. S. Behera, “A Jaya algorithm based wrapper method for optimal feature selection in supervised classification,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 6, pp. 3851–3863, Jun. 2022, doi: 10.1016/j.jksuci.2020.05.002.

[24] Y. Ramadhani, A. Mubarok, S. Hidayatullah, and W. Wiguna, “Attribute Optimization: Genetic Algorithms and Neural Network for Voice Analysis Classification of Parkinson’s Disease,” Scitepress, Aug. 2020, pp. 3074–3079. doi: 10.5220/0009947030743079.

[25] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of ICNN’95 - International Conference on Neural Networks, 1995, pp. 1942–1948 vol.4. doi: 10.1109/ICNN.1995.488968.

[26] J. Too, A. R. Abdullah, and N. M. Saad, “Hybrid binary particle swarm optimization differential evolution-based feature selection for EMG signals classification,” Axioms, vol. 8, no. 3, 2019, doi: 10.3390/axioms8030079.

[27] R. B. Mohamed, M. M. Yusof, N. Wahid, N. Murli, and M. Othman, “Bat algorithm and k-means techniques for classification performance improvement,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 15, no. 3, pp. 1411–1418, Sep. 2019, doi: 10.11591/ijeecs.v15.i3.pp1411-1418.

[28] M. Alzaqebah et al., “Memory based cuckoo search algorithm for feature selection of gene expression dataset,” Inform. Med. Unlocked, vol. 24, Jan. 2021, doi: 10.1016/j.imu.2021.100572.

[29] Q. Al-Tashi, H. Md Rais, S. J. Abdulkadir, S. Mirjalili, and H. Alhussian, “A Review of Grey Wolf Optimizer-Based Feature Selection Methods for Classification,” in Evolutionary Machine Learning Techniques, 2020, pp. 273–286. doi: 10.1007/978-981-32-9990-0_13.

[30] M. Bansal, A. Goyal, and A. Choudhary, “A comparative analysis of K-Nearest Neighbor, Genetic, Support Vector Machine, Decision Tree, and Long Short Term Memory algorithms in machine learning,” Decision Analytics Journal, vol. 3, p. 100071, Jun. 2022, doi: 10.1016/j.dajour.2022.100071.

[31] Y. Ramdhani, D. F. Apra, and D. P. Alamsyah, “Feature selection optimization based on genetic algorithm for support vector classification varieties of raisin,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 30, no. 1, p. 192, Apr. 2023, doi: 10.11591/ijeecs.v30.i1.pp192-199.

[32] R. Patil, S. Tamane, S. A. Rawandale, and K. Patil, “A modified mayfly-SVM approach for early detection of type 2 diabetes mellitus,” International Journal of Electrical and Computer Engineering, vol. 12, no. 1, pp. 524–533, Feb. 2022, doi: 10.11591/ijece.v12i1.pp524-533.

[33] M. Vishwakarma and N. Kesswani, “A new two-phase intrusion detection system with Naïve Bayes machine learning for data classification and elliptic envelop method for anomaly detection,” Decision Analytics Journal, vol. 7, Jun. 2023, doi: 10.1016/j.dajour.2023.100233.

[34] A. Wahab, S. Samarinda, I. Lishania, R. Goejantoro, and Y. N. Nasution, “Comparison of the Classification for Naive Bayes Method and the Decision Tree Algorithm (J48) for Stroke Patients in Abdul Wahab Sjahranie Samarinda Hospital,” Jurnal EKSPONENSIAL, vol. 10, no. 2, 2019.

[35] J. Daniel and J. H. Martin, “Logistic Regression,” in Speech and Language Processing, vol. 1, California: Stanford University, 2024, ch. 5, pp. 1–25. Accessed: Jul. 13, 2024. [Online]. Available: https://web.stanford.edu/~jurafsky/slp3/5.pdf

[36] D. Westari, “Performa Comparison of the K-Means Method for Classification in Diabetes Patients Using Two Normalization Methods,” INTERNATIONAL JOURNAL OF MULTIDISCIPLINARY RESEARCH AND ANALYSIS, vol. 04, no. 01, Jan. 2021, doi: 10.47191/ijmra/v4-i1-03.

[37] I. M. Karo and Hendriyana, “Klasifikasi Penderita Diabetes menggunakan Algoritma Machine Learning dan Z-Score,” Jurnal Teknologi Terpadu, vol. 8, pp. 1–6, 2022.

[38] M. R. Belgaum et al., “Enhancing the Efficiency of Diabetes Prediction through Training and Classification using PCA and LR Model,” Annals of Emerging Technologies in Computing, vol. 7, no. 3, pp. 78–91, 2023, doi: 10.33166/AETiC.2023.03.004.

[39] V. Diranisha, A. Triayudi, and R. T. Komalasari, “Implementation of K-Nearest Neighbour (KNN) Algorithm and Random Forest Algorithm in Identifying Diabetes,” SAGA: Journal of Technology and Information System, vol. 2, May 2024, doi: 10.58905/SAGA.vol2i2.253.

[40] N. Nipa, M. H. Riyad, S. Satu, Walliullah, K. C. Howlader, and M. A. Moni, “Clinically adaptable machine learning model to identify early appreciable features of diabetes,” Intelligent Medicine, vol. 4, no. 1, pp. 22–32, Feb. 2024, doi: 10.1016/j.imed.2023.01.003.

[41] S. Wei, X. Zhao, and C. Miao, “A comprehensive exploration to the machine learning techniques for diabetes identification,” in 2018 IEEE 4th World Forum on Internet of Things (WF-IoT), 2018, pp. 291–295. doi: 10.1109/WF-IoT.2018.8355130.

[42] R. D. Joshi and C. K. Dhakal, “Predicting type 2 diabetes using logistic regression and machine learning approaches,” Int. J. Environ. Res. Public Health, vol. 18, no. 14, Jul. 2021, doi: 10.3390/ijerph18147346.

[43] D. Sanghavi, D. Sanghavi, and N. Patil, “Early-Stage Diabetes Mellitus Risk Prediction and Symptom Association: A Comparative Analysis Using Feature Importance,” Educational Administration Theory and Practices, Jan. 2024, doi: 10.53555/kuey.v30i1.6939.

[44] P. Zhang, C. Fonnesbeck, D. C. Schmidt, J. White, and S. A. Mulvaney, “Understanding Barriers to Diabetes Self-Management Using Momentary Assessment and Machine Learning.”

[45] P. Rajendra and S. Latifi, “Prediction of diabetes using logistic regression and ensemble techniques,” Computer Methods and Programs in Biomedicine Update, vol. 1, Jan. 2021, doi: 10.1016/j.cmpbup.2021.100032.

[46] E. Preprint, S. Gill, and P. Pathwar, “Prediction of Diabetes Using Various Feature Selection and Machine Learning Paradigms,” 2021.

[47] K. Verma and P. Harshavardhanan, “Type-2 Diabetes Prediction using Machine Learning Algorithms and Ensembles with Hyperparameters,” 2024. [Online]. Available: http://creativecommons.org/licenses/by/3.0/,whichper-mitsunrestricteduse,providedtheoriginalauthorandsourcearecredited.

[48] G. T. Reddy and N. Khare, “Hybrid Firefly-Bat optimized fuzzy artificial neural network based classifier for diabetes diagnosis,” International Journal of Intelligent Engineering and Systems, vol. 10, no. 4, pp. 18–27, Aug. 2017, doi: 10.22266/ijies2017.0831.03.

[49] S. Kumar Bhoi et al., “Prediction of Diabetes in Females of Pima Indian Heritage: A Complete Supervised Learning Approach,” 2021.

[50] N. Hartono, “Multi-Objective Bees Algorithm for Feature Selection,” SciTePress, Dec. 2022, pp. 358–369. doi: 10.5220/0010754200003113.

[51] L. W. Astuti, I. Saluza, E. Yulianti, and D. Dhamayanti, “Feature Selection Menggunakan Binary Wheal Optimizaton Algorithm (BWOA) pada Klasifikasi Penyakit Diabetes,” Jurnal Ilmiah Informatika Global, vol. 13, no. 1, Mar. 2022, doi: 10.36982/jiig.v13i1.2057.

[52] S. Hassan, T. Akter, F. Tasnim, and M. K. Newaz, “Machine Learning Models to Identify Discriminatory Factors of Diabetes Subtypes,” in Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, Springer Science and Business Media Deutschland GmbH, 2023, pp. 55–67. doi: 10.1007/978-3-031-34622-4_5.

[53] B. P. Kumar, “Diabetes Prediction and Comparative Analysis Using Machine Learning Algorithm,” International Research Journal of Modernization in Engineering Technology and Science, vol. 4, no. 5, pp. 4688–4696, 2022, [Online]. Available: www.irjmets.com

[54] V. Vakil, S. Pachchigar, C. Chavda, and S. Soni, “Explainable predictions of different machine learning algorithms used to predict Early Stage diabetes,” 2021.

[55] P. Rajendra and S. Latifi, “Prediction of diabetes using logistic regression and ensemble techniques,” Computer Methods and Programs in Biomedicine Update, vol. 1, Jan. 2021, doi: 10.1016/j.cmpbup.2021.100032.

[56] S. Balasubramanian, R. Kashyap, S. T. CVN, and M. Anuradha, “Hybrid Prediction Model for Type-2 Diabetes with Class Imbalance,” in Proceedings of the 2020 IEEE International Conference on Machine Learning and Applied Network Technologies, ICMLANT 2020, Institute of Electrical and Electronics Engineers Inc., Dec. 2020. doi: 10.1109/ICMLANT50963.2020.9355975.

[57] S. Gill and P. Pathwar, “Prediction of Diabetes Using Various Feature Selection and Machine Learning Paradigms,” 2021.

[58] Edgar Ceh-Varela, L. Maes, and Sarbagya Ratna Shakya, “Machine Learning Analysis of Factors Contributing to Diabetes Development,” Cloud Computing and Data Science, pp. 157–182, Jan. 2024, doi: 10.37256/ccds.5120243751.

[59] M. M. Mijwil and M. Aljanabi, “A Comparative Analysis of Machine Learning Algorithms for Classification of Diabetes Utilizing Confusion Matrix Analysis,” Baghdad Science Journal, vol. 21, no. 5, pp. 1712–1728, 2024, doi: 10.21123/BSJ.2023.9010.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.