Application of SMOTE in Sentiment Analysis of MyXL User Reviews on Google Play Store
DOI:
https://doi.org/10.14421/jiska.2025.10.1.74-86Keywords:
analisis sentimen, logistic regression, support vector machine, gridsearchcv, smoteAbstract
Aplikasi di dunia nyata sering kali memiliki kumpulan data teks yang tidak seimbang, yang menyebabkan hasil prediksi algoritma klasifikasi menjadi bias terhadap kelas mayoritas dan mengabaikan kelas minoritas. Akurasi yang tinggi tidak mencerminkan kinerja yang sebenarnya. Penelitian ini menggunakan teknik SMOTE untuk menyeimbangkan kelas dalam analisis sentimen menggunakan kumpulan data ulasan pengguna MyXL dari Google PlayStore. Kami membandingkan kinerja algoritma Regresi Logistik dan Support Vector Machine dengan data yang tidak seimbang dan data yang seimbang dari SMOTE. Fitur teks diekstraksi menggunakan TF-IDF, dan metrik evaluasi didasarkan pada akurasi, presisi, recall dan skor F1, yang dioptimalkan melalui GridSearchCV di Scikit-learn. Kinerja terbaik dicapai dengan menerapkan SMOTE ke algoritma SVM, yang menghasilkan akurasi 73,00%, presisi 67,13%, recall 65,82% dan skor F1 66,30%.
References
Campesato, O. (2021). Natural laNguage ProcessiNg FuNdameNtals For develoPers. Mercury Learning and Information. https://doi.org/https://doi.org/10.1515/9781683926566
Chawla, N. V, Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. In Journal of Artificial Intelligence Research (Vol. 16).
Darwis, H., Wanaspati, N., & Anraeni, S. (2023). Support Vector Machine untuk Analisis Sentimen Masyarakat Terhadap Penggunaan Antibiotik di Indonesia. Indonesian Journal of Computer Science Attribution2021, 12(4), 2196.
Diandra A, D. (2022). Data Ulasan Terlabel. https://www.kaggle.com/datasets/dimasdiandraa/data-ulasan-terlabel?select=Ulasan+My+XL+1000+Data+Labelled.csv
Febrianti, F. A. D. P., Hamami, F., & Fa’rifah, R. Y. (2023). Aspect-Based Sentiment Analysis Terhadap Ulasan Aplikasi Flip Menggunakan Pembobotan Term Frequency-Inverse Document Frequency (Tf-Idf) Dengan Metode Klasifikasi K-Nearest Neighbors (K-Nn). Jurnal Indonesia : Manajemen Informatika Dan Komunikasi, 4(3), 1858–1873. https://doi.org/10.35870/jimik.v4i3.429
Grandini, M., Bagli, E., & Visani, G. (2020). Metrics for Multi-Class Classification: an Overview. http://arxiv.org/abs/2008.05756
Haikal, M., & Hayati, U. (2023). ANALISIS SENTIMEN TERHADAP PENGGUNAAN APLIKASI GAME ONLINE PUBG MOBILE MENGGUNAKAN ALGORITMA NAIVE BAYES. In Jurnal Mahasiswa Teknik Informatika (Vol. 7, Issue 6).
Harahap, F. H., Darnius, O., & Sembiring, P. (2023). Klasifikasi Menggunakan Model Regresi Logistik Multinomial dan Regresi Logistik Multinomial Komponen Utama. In IJM: Indonesian Journal of Multidisciplinary (Vol. 1). https://journal.csspublishing/index.php/ijm
Huda, M. N., Fauzan, D. A., Pamungkas, M. R. S. P., Ratnadewi, N. S., & Vahendra, A. A. (2023). Optimalisasi Model Klasifikasi Sentimen Netizen Terhadap Merek Tas Luar Negeri. Jurnal KomtekInfo, 21–28. https://doi.org/10.35134/komtekinfo.v10i1.360
Huda Ovirianti, N., Zarlis, M., & Mawengkang, H. (2022). Support Vector Machine Using A Classification Algorithm. Jurnal Dan Penelitian Teknik Informatika, 6(3). https://doi.org/10.33395/sinkron.v7i3
Karyadi, B. (2023). PEMANFAATAN KECERDASAN BUATAN DALAM MENDUKUNG PEMBELAJARAN MANDIRI. 8(2), 253–258. https://doi.org/10.32832/educate.v8i02.14843
Khushi, M., Shaukat, K., Alam, T. M., Hameed, I. A., Uddin, S., Luo, S., Yang, X., & Reyes, M. C. (2021). A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data. IEEE Access, 9, 109960–109975. https://doi.org/10.1109/ACCESS.2021.3102399
Muntasir Nishat, M., Faisal, F., Jahan Ratul, I., Al-Monsur, A., Ar-Rafi, A. M., Nasrullah, S. M., Reza, M. T., & Khan, M. R. H. (2022). A Comprehensive Investigation of the Performances of Different Machine Learning Classifiers with SMOTE-ENN Oversampling Technique and Hyperparameter Optimization for Imbalanced Heart Failure Dataset. Scientific Programming, 2022. https://doi.org/10.1155/2022/3649406
Ramadhani, B., & Suryono, R. R. (2024). Komparasi Algoritma Naïve Bayes dan Logistic Regression Untuk Analisis Sentimen Metaverse. JURNAL MEDIA INFORMATIKA BUDIDARMA, 8(2), 714. https://doi.org/10.30865/mib.v8i2.7458
Syah, F., Fajrin, H., Afif, A. N., Saeputra, R., Mirranty, D., & Saputra, D. D. (2023). Analisa Sentimen Terhadap Twitter IndihomeCare Menggunakan Perbandingan Algoritma Smote, Support Vector Machine, AdaBoost dan Particle Swarm Optimization. Jurnal Teknologi Informasi Dan Komunikasi), 7(1). https://doi.org/10.35870/jti
Urva, G., Desyanti, Albanna, I., & Sobri Sungkar, M. (2023). PENERAPAN DATA MINING DI BERBAGAI BIDANG : Konsep, Metode, dan Studi Kasus. In Sonpedia. https://books.google.co.id/books?hl=en&lr=&id=uq6-EAAAQBAJ&oi=fnd&pg=PA31&dq=Analisis+sentimen+merupakan+studi+yang+mengkaji+teks+berisi+opini,+emosi,+dan+sentimen+masyarakat+tentang+produk,+layanan+publik,+trend+terbaru+dan+lain-lain.+Data+teks+ini+lazim
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Badriyah, Totok Chamidy, Suhartono

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms as stated in http://creativecommons.org/licenses/by-nc/4.0
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.