Analisis Performa Normalisasi Data untuk Klasifikasi K-Nearest Neighbor pada Dataset Penyakit
DOI:
https://doi.org/10.14421/jiska.2024.9.3.178-191Keywords:
Data Normalization, Disease, Min-Max, Z-Score, Decimal Scaling, MaxAbs, K-Nearest NeighborAbstract
This study investigates four normalization methods (Min-Max, Z-Score, Decimal Scaling, MaxAbs) across prostate, kidney, and heart disease datasets for K-Nearest Neighbor (K-NN) classification. Imbalanced feature scales can hinder K-NN performance, making normalization crucial. Results show that Decimal Scaling achieves 90.00% accuracy in prostate cancer, while Min-Max and Z-Score yield 97.50% in kidney disease. MaxAbs performs well with 96.25% accuracy in kidney disease. In heart disease, Min-Max and MaxAbs attain accuracies of 82.93% and 81.95%, respectively. These findings suggest Decimal Scaling suits datasets with few instances, limited features, and normal distribution. Min-Max and MaxAbs are better for datasets with numerous instances and non-normal distribution. Z-Score fits datasets with a wide range of feature numbers and near-normal distribution. This study aids in selecting the appropriate normalization method based on dataset characteristics to enhance K-NN classification accuracy in disease diagnosis. The experiments involve datasets with different attributes, continuous and categorical data, and binary classification. Data conditions such as the number of instances, the number of features, and data distribution affect the performance of normalization and classification.
References
Ambarwari, A., Jafar Adrian, Q., & Herdiyeni, Y. (2020). Analysis of the Effect of Data Scaling on the Performance of the Machine Learning Algorithm for Plant Identification. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 4(1), 117–122. https://doi.org/10.29207/resti.v4i1.1517
Badugu, S. (2020). Prediction of Heart Problems for Diabetic Patients using Classification Algorithms. Journal of Advanced Research in Dynamic and Control Systems, Volume 12(02-Special Issue), 904–913. https://doi.org/10.5373/JARDCS/V12SP2/SP20201148
Barus, F. M., & Sutarman, S. (2023). Mendeteksi Outlier pada Data Multivariat dengan Metode Jarak Mahalanobis-Minimum Covariance Determinant (MMCD). IJM: Indonesian Journal of Multidisciplinary, 1(3), 1164–1172. https://journal.csspublishing.com/index.php/ijm/article/view/287
Cahyanti, D., Rahmayani, A., & Husniar, S. A. (2020). Analisis performa metode Knn pada Dataset pasien pengidap Kanker Payudara. Indonesian Journal of Data and Science, 1(2), 39–43. https://doi.org/10.33096/ijodas.v1i2.13
Chandra, R., Chaudhary, K., & Kumar, A. (2022). Comparison of Data Normalization for Wine Classification Using K-NN Algorithm. IJIIS: International Journal of Informatics and Information Systems, 5(4), 175–180. https://doi.org/10.47738/ijiis.v5i4.145
Henderi, H., Wahyuningsih, T., & Rahwanto, E. (2021). Comparison of Min-Max normalization and Z-Score Normalization in the K-nearest neighbor (kNN) Algorithm to Test the Accuracy of Types of Breast Cancer. IJIIS: International Journal of Informatics and Information Systems, 4(1), 13–20. https://doi.org/10.47738/ijiis.v4i1.73
HS, H., Azmi, N., Hazriani, H., & Yuyun, Y. (2023). Klasifikasi Status Gizi Balita Menggunakan Algoritma K-Nearest Neighbor (KNN) | Prosiding SISFOTEK. Prosiding SISFOTEK, 7(1), 313–318. https://seminar.iaii.or.id/index.php/SISFOTEK/article/view/396
Indeed Editorial Team. (2024, August 16). Normalization Formula: How To Use It on a Data Set | Indeed.com. Indeed. https://www.indeed.com/career-advice/career-development/normalization-formula
Jain, A. K., Duin, P. W., & Mao, J. (2000). Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 4–37. https://doi.org/10.1109/34.824819
Kusnaidi, M. R., Gulo, T., & Aripin, S. (2022). Penerapan Normalisasi Data Dalam Mengelompokkan Data Mahasiswa Dengan Menggunakan Metode K-Means Untuk Menentukan Prioritas Bantuan Uang Kuliah Tunggal. Journal of Computer System and Informatics (JoSYC), 3(4), 330–338. https://doi.org/10.47065/josyc.v3i4.2112
Marlina, D., & Bakri, M. (2021). Penerapan Data Mining untuk Memprediksi Transaksi Nasabah dengan Algoritma C4.5. Jurnal Teknologi Dan Sistem Informasi, 2(1), 23–28. https://doi.org/10.33365/JTSI.V2I1.627
McLeod, S. (2023, October 6). Z-Score: Definition, Formula, Calculation & Interpretation. Simply Psychology. https://www.simplypsychology.org/z-score.html
Pagan, M., Zarlis, M., & Candra, A. (2023). Investigating the impact of data scaling on the k-nearest neighbor algorithm. Computer Science and Information Technologies, 4(2), 135–142. https://doi.org/10.11591/csit.v4i2.p135-142
Permana, I., & Salisah, F. N. S. (2022). Pengaruh Normalisasi Data Terhadap Performa Hasil Klasifikasi Algoritma Backpropagation. Indonesian Journal of Informatic Research and Software Engineering (IJIRSE), 2(1), 67–72. https://doi.org/10.57152/ijirse.v2i1.311
Riaz, M., Bashir, M., & Younas, I. (2022). Metaheuristics based COVID-19 detection using medical images: A review. Computers in Biology and Medicine, 144, 105344. https://doi.org/10.1016/j.compbiomed.2022.105344
Sholeh, M., Andayati, D., & Rachmawati, Rr. Y. (2022). Data Mining Model Klasifikasi Menggunakan Algoritma K-Nearest Neighbor dengan Normalisasi untuk Prediksi Penyakit Diabetes. TeIKa, 12(02), 77–87. https://doi.org/10.36342/teika.v12i02.2911
Singh, D., & Singh, B. (2020). Investigating the impact of data normalization on classification performance. Applied Soft Computing, 97, 105524. https://doi.org/10.1016/j.asoc.2019.105524
Whendasmoro, R. G., & Joseph, J. (2022). Analisis Penerapan Normalisasi Data Dengan Menggunakan Z-Score Pada Kinerja Algoritma K-NN. JURIKOM (Jurnal Riset Komputer), 9(4), 872. https://doi.org/10.30865/jurikom.v9i4.4526
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Petronilia Palinggik Allorerung, Angdy Erna, Muhammad Bagussahrir, Samsu Alam
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms as stated in http://creativecommons.org/licenses/by-nc/4.0
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.