Optimasi Seleksi Fitur Information Gain pada Algoritma Naïve Bayes dan K-Nearest Neighbor
DOI:
https://doi.org/10.14421/jiska.2022.7.3.237-255Keywords:
Prediction, Naïve Bayes, K-Nearest Neighbor, Information Gain, Confusion MatrixAbstract
There was an increase in the number of late payments of tuition fees by 3,018 from a total of 5,535 students at the end of 2020. This study uses the Python library which requires data to be of numeric type, so it requires data transformation according to the type of data in the study, data that has a scale is transformed using an ordinal encoder, and data that does not have a scale is transformed using one-hot encoding. The purpose of this study was to evaluate the performance of the Naïve Bayes algorithm and K-Nearest Neighbor with a confusion matrix in predicting late payment of tuition fees at UMKT. The dataset used in this study was sourced from the financial administration bureau as many as 12,408 data with a distribution of 90:10. Based on the results of the calculation of the selection of information gain features, the best 4 attributes that influence the research are obtained, namely faculty, study program, class, and gender. The results of the evaluation of the confusion matrix that have the best performance using the Naïve Bayes with information gain algorithm obtain an accuracy of 55.19%, while the K-Nearest Neighbor with information gain only obtains an accuracy of 50.76%. Based on the accuracy results obtained in the prediction of late payment of tuition fees by using attributes derived from information gain, it influences increasing the accuracy of Naïve Bayes, but the use of the information gain attribute on the K-Nearest Neighbor algorithm makes the accuracy obtained decrease.
References
Akhmad, M. R., & Siswa, T. A. Y. (2022). Implementasi K-Nearest Neighbor Dalam Memprediksi Keterlambatan Pembayaran Biaya Kuliah Di Perguruan Tinggi. Progresif: Jurnal Ilmiah Komputer, 18(2), 185. https://doi.org/10.35889/progresif.v18i2.921
Ali, H., Mohd Salleh, M. N., Saedudin, R., Hussain, K., & Mushtaq, M. F. (2019). Imbalance class problems in data mining: a review. Indonesian Journal of Electrical Engineering and Computer Science, 14(3), 1552. https://doi.org/10.11591/ijeecs.v14.i3.pp1552-1563
Amelia, M. winny, Lumenta, A. S. ., & Jacobus, A. (2017). Prediksi Masa Studi Mahasiswa dengan Menggunakan Algoritma Naïve Bayes. Jurnal Teknik Informatika, 11(1). https://doi.org/10.35793/jti.11.1.2017.17652
Id, I. D. (2021). Machine Learning : Teori, Studi Kasus dan Implementasi Menggunakan Python. UR PRESS. https://doi.org/10.5281/zenodo.5113507
Kinoto, J., Damanik, J. L., Situmorang, E. T. S., Siregar, J., & Harahap, M. (2020). Prediksi Employee Churn Dengan Uplift Modeling Menggunakan Algoritma Logistic Regression. Jurnal Teknologi Dan Ilmu Komputer Prima (JUTIKOMP), 3(2), 503–508. https://doi.org/10.34012/jutikomp.v3i2.1645
Kurniawan, D. (2020). Pengenalan Machine Learning dengan Python. PT Elex Media Komputindo.
Muqorobin, M., Kusrini, K., & Luthfi, E. T. (2019). Optimasi Metode Naive Bayes dengan Feature Selection Information Gain untuk Prediksi Keterlambatan Pembayaran SPP Sekolah. Jurnal Ilmiah SINUS, 17(1), 1. https://doi.org/10.30646/sinus.v17i1.378
Muqorobin, M., Kusrini, K., Rokhmah, S., & Muslihah, I. (2020). Estimation System For Late Payment Of School Tuition Fees. International Journal of Computer and Information System (IJCIS), 1(1), 1–6. https://doi.org/10.29040/ijcis.v1i1.5
Mustakim, M., & Oktaviani, G. (2016). Algoritma K-Nearest Neighbor Classification Sebagai Sistem Prediksi Predikat Prestasi Mahasiswa. Jurnal Sains, Teknologi, Dan Industri, 13(2), 195–202. https://doi.org/10.24014/sitekin.v13i2.1688
Primartha, R. (2021). Algoritma Machine Learning. Informatika.
Rahmatullah, S. (2019). Prediksi Tingkat Kelulusan Tepat Waktu dengan Metode Naïve Bayes dan K-Nearest Neighbor. Jurnal Informasi Dan Komputer, 7(1), 7–16. https://doi.org/10.35959/jik.v7i1.118
Rajaraman, A., & Ullman, J. D. (2011). Data Mining. In Mining of Massive Datasets (Vol. 2, Issue January 2013, pp. 1–17). Cambridge University Press. https://doi.org/10.1017/CBO9781139058452.002
Rifai, M. F., Jatnika, H., & Valentino, B. (2019). Penerapan Algoritma Naïve Bayes Pada Sistem Prediksi Tingkat Kelulusan Peserta Sertifikasi Microsoft Office Specialist (MOS). PETIR, 12(2), 131–144. https://doi.org/10.33322/petir.v12i2.471
Rohmayani, D. (2020). Analysis Of Student Tuition Fee Pay Delay Prediction Using Naive Bayes Algorithm With Particle Swarm Optimation Optimazation (Case Study : Politeknik TEDC Bandung). Jurnal Teknologi Informasi Dan Pendidikan, 13(2), 1–8. https://doi.org/10.24036/tip.v13i2.317
Salmu, S., & Solichin, A. (2017). Prediksi Tingkat Kelulusan Mahasiswa Tepat Waktu Menggunakan Naive Bayes: Studi Kasus UIN Syarif Hidayatullah Jakarta. Seminar Nasional Multidisiplin Ilmu (SENMI), 701–709.
Saputro, I. W., & Sari, B. W. (2020). Uji Performa Algoritma Naïve Bayes untuk Prediksi Masa Studi Mahasiswa. Creative Information Technology Journal, 6(1), 1. https://doi.org/10.24076/citec.2019v6i1.178
Sari, B. N. (2016). Implementasi Teknik Seleksi Fitur Information Gain Pada Algoritma Klasifikasi Machine Learning Untuk Prediksi Performa Akademik Siswa. Seminar Nasional Teknologi Informasi Dan Multimedia 2016, March, 55–60.
Setiyorini, T., & Asmono, R. T. (2019). Penerapan Metode K-Nearest Neighbor dan Information Gain pada Klasifikasi Kinerja Siswa. JITK (Jurnal Ilmu Pengetahuan Dan Teknologi Komputer), 5(1), 7–14. https://doi.org/10.33480/jitk.v5i1.613
Suardika, I. G. I. (2019). Prediksi Tingkat Kelulusan Mahasiswa Tepat Waktu Menggunakan Naive Bayes: Studi Kasus Fakultas Ekonomi dan Bisnis Universitas Pendidikan Nasional. Jurnal Ilmu Komputer Indonesia, 4(2), 37–44. https://doi.org/10.23887/jik.v4i2.2775
Suntoro, J. (2019). Data Mining Algoritma dan Implementasi dengan Pemrograman PHP. PT Elex Media Komputindo.
Suyanto. (2017). Data Mining untuk Klasifikasi dan Klasterisasi Data. Informatika.
Wanto, A., Siregar, M. N. H., Windarto, A. P., Hartama, D., Ginantra, N. L. W. S. R., Napitupulu, D., Negara, E. S., Lubis, M. R., Dewi, S. V., & Prianto, C. (2020). Data Mining : Algoritma Klasifikasi. Yayasan Kita Menulis.
Widaningsih, S. (2019). Perbandingan Metode Data Mining untuk Prediksi Nilai dan Waktu Kelulusan Mahasiswa Prodi Teknik Informatika dengan Algoritma C4.5, Naïve Bayes, KNN Dan SVM. Jurnal Tekno Insentif, 13(1), 16–25. https://doi.org/10.36787/jti.v13i1.78
Widystuti, W., & Darmawan, J. B. B. (2018). Pengaruh jumlah data set terhadap akurasi pengenalan dalam deep convolutional network. Konferensi Nasional Sistem Informasi (KNSI), 634–636.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Muhammad Norhalimi, Taghfirul Azhima Yoga Siswa
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms as stated in http://creativecommons.org/licenses/by-nc/4.0
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.