Analisis Performa Normalisasi Data untuk Klasifikasi K-Nearest Neighbor pada Dataset Penyakit

Authors

  • Petronilia Palinggik Allorerung Universitas Dipa Makassar
  • Angdy Erna Universitas Dipa Makassar
  • Muhammad Bagussahrir Universitas Dipa Makassar
  • Samsu Alam Universitas Dipa Makassar

DOI:

https://doi.org/10.14421/jiska.2024.9.3.178-191

Keywords:

Data Normalization, Disease, Min-Max, Z-Score, Decimal Scaling, MaxAbs, K-Nearest Neighbor

Abstract

This study investigates four normalization methods (Min-Max, Z-Score, Decimal Scaling, MaxAbs) across prostate, kidney, and heart disease datasets for K-Nearest Neighbor (K-NN) classification. Imbalanced feature scales can hinder K-NN performance, making normalization crucial. Results show that Decimal Scaling achieves 90.00% accuracy in prostate cancer, while Min-Max and Z-Score yield 97.50% in kidney disease. MaxAbs performs well with 96.25% accuracy in kidney disease. In heart disease, Min-Max and MaxAbs attain accuracies of 82.93% and 81.95%, respectively. These findings suggest Decimal Scaling suits datasets with few instances, limited features, and normal distribution. Min-Max and MaxAbs are better for datasets with numerous instances and non-normal distribution. Z-Score fits datasets with a wide range of feature numbers and near-normal distribution. This study aids in selecting the appropriate normalization method based on dataset characteristics to enhance K-NN classification accuracy in disease diagnosis. The experiments involve datasets with different attributes, continuous and categorical data, and binary classification. Data conditions such as the number of instances, the number of features, and data distribution affect the performance of normalization and classification.

References

Ambarwari, A., Jafar Adrian, Q., & Herdiyeni, Y. (2020). Analysis of the Effect of Data Scaling on the Performance of the Machine Learning Algorithm for Plant Identification. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 4(1), 117–122. https://doi.org/10.29207/resti.v4i1.1517

Badugu, S. (2020). Prediction of Heart Problems for Diabetic Patients using Classification Algorithms. Journal of Advanced Research in Dynamic and Control Systems, Volume 12(02-Special Issue), 904–913. https://doi.org/10.5373/JARDCS/V12SP2/SP20201148

Barus, F. M., & Sutarman, S. (2023). Mendeteksi Outlier pada Data Multivariat dengan Metode Jarak Mahalanobis-Minimum Covariance Determinant (MMCD). IJM: Indonesian Journal of Multidisciplinary, 1(3), 1164–1172. https://journal.csspublishing.com/index.php/ijm/article/view/287

Cahyanti, D., Rahmayani, A., & Husniar, S. A. (2020). Analisis performa metode Knn pada Dataset pasien pengidap Kanker Payudara. Indonesian Journal of Data and Science, 1(2), 39–43. https://doi.org/10.33096/ijodas.v1i2.13

Chandra, R., Chaudhary, K., & Kumar, A. (2022). Comparison of Data Normalization for Wine Classification Using K-NN Algorithm. IJIIS: International Journal of Informatics and Information Systems, 5(4), 175–180. https://doi.org/10.47738/ijiis.v5i4.145

Henderi, H., Wahyuningsih, T., & Rahwanto, E. (2021). Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer. IJIIS: International Journal of Informatics and Information Systems, 4(1), 13–20. https://doi.org/10.47738/ijiis.v4i1.73

HS, H., Azmi, N., Hazriani, H., & Yuyun, Y. (2023). Klasifikasi Status Gizi Balita Menggunakan Algoritma K-Nearest Neighbor (KNN) | Prosiding SISFOTEK. Prosiding SISFOTEK, 7(1), 313–318. https://seminar.iaii.or.id/index.php/SISFOTEK/article/view/396

Indeed Editorial Team. (2024, August 16). Normalization Formula: How To Use It on a Data Set | Indeed.com. Indeed. https://www.indeed.com/career-advice/career-development/normalization-formula

Jain, A. K., Duin, P. W., & Mao, J. (2000). Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 4–37. https://doi.org/10.1109/34.824819

Kusnaidi, M. R., Gulo, T., & Aripin, S. (2022). Penerapan Normalisasi Data Dalam Mengelompokkan Data Mahasiswa Dengan Menggunakan Metode K-Means Untuk Menentukan Prioritas Bantuan Uang Kuliah Tunggal. Journal of Computer System and Informatics (JoSYC), 3(4), 330–338. https://doi.org/10.47065/josyc.v3i4.2112

Marlina, D., & Bakri, M. (2021). Penerapan Data Mining untuk Memprediksi Transaksi Nasabah dengan Algoritma C4.5. Jurnal Teknologi Dan Sistem Informasi, 2(1), 23–28. https://doi.org/10.33365/JTSI.V2I1.627

McLeod, S. (2023, October 6). Z-Score: Definition, Formula, Calculation & Interpretation. Simply Psychology. https://www.simplypsychology.org/z-score.html

Pagan, M., Zarlis, M., & Candra, A. (2023). Investigating the impact of data scaling on the k-nearest neighbor algorithm. Computer Science and Information Technologies, 4(2), 135–142. https://doi.org/10.11591/csit.v4i2.p135-142

Permana, I., & Salisah, F. N. S. (2022). Pengaruh Normalisasi Data Terhadap Performa Hasil Klasifikasi Algoritma Backpropagation. Indonesian Journal of Informatic Research and Software Engineering (IJIRSE), 2(1), 67–72. https://doi.org/10.57152/ijirse.v2i1.311

Riaz, M., Bashir, M., & Younas, I. (2022). Metaheuristics based COVID-19 detection using medical images: A review. Computers in Biology and Medicine, 144, 105344. https://doi.org/10.1016/j.compbiomed.2022.105344

Sholeh, M., Andayati, D., & Rachmawati, Rr. Y. (2022). Data Mining Model Klasifikasi Menggunakan Algoritma K-Nearest Neighbor dengan Normalisasi untuk Prediksi Penyakit Diabetes. TeIKa, 12(02), 77–87. https://doi.org/10.36342/teika.v12i02.2911

Singh, D., & Singh, B. (2020). Investigating the impact of data normalization on classification performance. Applied Soft Computing, 97, 105524. https://doi.org/10.1016/j.asoc.2019.105524

Whendasmoro, R. G., & Joseph, J. (2022). Analisis Penerapan Normalisasi Data Dengan Menggunakan Z-Score Pada Kinerja Algoritma K-NN. JURIKOM (Jurnal Riset Komputer), 9(4), 872. https://doi.org/10.30865/jurikom.v9i4.4526

Downloads

Published

2024-09-25

How to Cite

Allorerung, P. P., Erna, A., Bagussahrir, M., & Alam, S. (2024). Analisis Performa Normalisasi Data untuk Klasifikasi K-Nearest Neighbor pada Dataset Penyakit. JISKA (Jurnal Informatika Sunan Kalijaga), 9(3), 178–191. https://doi.org/10.14421/jiska.2024.9.3.178-191