Application of SMOTE in Sentiment Analysis of MyXL User Reviews on Google Play Store

Authors

  • Badriyah Badriyah UIN Maulana Malik Ibrahim Malang
  • Totok Chamidy UIN Maulana Malik Ibrahim Malang
  • Suhartono Suhartono UIN Maulana Malik Ibrahim Malang

DOI:

https://doi.org/10.14421/jiska.2025.10.1.74-86

Keywords:

Sentiment Analysis, Logistic Regression, Support Vector Machine, GridSearchCV, SMOTE

Abstract

Texts that express customer opinions about a product are important input for companies. Companies obtain valuable information from consumer perceptions of marketed products by conducting sentiment analysis. However, real-world text datasets are often unbalanced, causing the prediction results of classification algorithms to be biased towards the majority class and ignore the minority class. This study analyzes the sentiment of MyXL user reviews on the Google Play Store by comparing the performance of the Logistic Regression and Support Vector Machine algorithms in the SMOTE implementation. This analysis uses TF-IDF to extract feature and GridSearchCV to optimize the accuracy, precision, recall, and F1 score evaluation metrics. This study follows several scenarios of dividing training data and test data. SVM implementing SMOTE is the algorithm with the best performance using the division of training data (90%) and test data (10%), resulting in accuracy (73.00%), precision (67.13%), recall (65.82%) and F1 score (66.30%).

References

Audiansyah, D. D. (2022, July 5). Data Ulasan Terlabel. Kaggle. https://www.kaggle.com/datasets/dimasdiandraa/data-ulasan-terlabel?select=Ulasan+My+XL+1000+Data+Labelled.csv

Atmanegara, E., & Purwa, T. (2021). Hybrid Support Vector Machine and Logistic Regression for Multiclass Classification: A Case Study on Wine Dataset. Indonesian Journal of Data Science, 1(1), 1–7. https://www.researchgate.net/publication/353211298

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Cheng, M., & Mani, R. (2024, June 24). Voice of the Consumer Survey 2024: Asia Pacific. PWC Indonesia. https://www.pwc.com/id/en/pwc-publications/industries-publications/consumer-and-industrial-products-and-services/consumer-survey-2024-asia-pacific.html

Darwis, H., Wanaspati, N., & Anraeni, S. (2023). Support Vector Machine untuk Analisis Sentimen Masyarakat Terhadap Penggunaan Antibiotik di Indonesia. The Indonesian Journal of Computer Science, 12(4), 12. https://doi.org/10.33022/ijcs.v12i4.3320

Hasibuan, E., & Heriyanto, E. A. (2022). Analisis Sentimen pada Ulasan Aplikasi Amazon Shopping di Google Play Store Menggunakan Naive Bayes Classifier. Jurnal Teknik Dan Science, 1(3), 13–24. https://doi.org/10.56127/jts.v1i3.434

Febrianti, F. A. D. P., Hamami, F., & Fa’rifah, R. Y. (2023). Aspect-Based Sentiment Analysis Terhadap Ulasan Aplikasi Flip Menggunakan Pembobotan Term Frequency-Inverse Document Frequency (TF-IDF) Dengan Metode Klasifikasi K-Nearest Neighbors (K-NN). Jurnal Indonesia : Manajemen Informatika Dan Komunikasi, 4(3), 1858–1873. https://doi.org/10.35870/jimik.v4i3.429

Grandini, M., Bagli, E., & Visani, G. (2020). Metrics for Multi-Class Classification: an Overview. http://arxiv.org/abs/2008.05756

Haikal, M., Martanto, M., & Hayati, U. (2024). Analisis Sentimen Terhadap Penggunaan Aplikasi Game Online PUBG Mobile Menggunakan Algoritma Naive Bayes. JATI (Jurnal Mahasiswa Teknik Informatika), 7(6), 3275–3281. https://doi.org/10.36040/jati.v7i6.8174

Harahap, F. H., Sutarman, S., Darnius, O., & Sembiring, P. (2023). Klasifikasi Menggunakan Model Regresi Logistik Multinomial dan Regresi Logistik Multinomial Komponen Utama. IJM: Indonesian Journal of Multidisciplinary, 1(2), 632–642. https://journal.csspublishing.com/index.php/ijm/article/view/183

Huda, M. N., Fauzan, D. A., Pamungkas, M. R. S. P., Ratnadewi, N. S., & Vahendra, A. A. (2023). Optimalisasi Model Klasifikasi Sentimen Netizen Terhadap Merek Tas Luar Negeri. Jurnal KomtekInfo, 21–28. https://doi.org/10.35134/komtekinfo.v10i1.360

Khushi, M., Shaukat, K., Alam, T. M., Hameed, I. A., Uddin, S., Luo, S., Yang, X., & Reyes, M. C. (2021). A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data. IEEE Access, 9, 109960–109975. https://doi.org/10.1109/ACCESS.2021.3102399

Nishat, M. M., Faisal, F., Ratul, I. J., Al-Monsur, A., Ar-Rafi, A. M., Nasrullah, S. M., Reza, M. T., & Khan, M. R. H. (2022). A Comprehensive Investigation of the Performances of Different Machine Learning Classifiers with SMOTE-ENN Oversampling Technique and Hyperparameter Optimization for Imbalanced Heart Failure Dataset. Scientific Programming, 2022, 1–17. https://doi.org/10.1155/2022/3649406

Ovirianti, N. H., Zarlis, M., & Mawengkang, H. (2022). Support Vector Machine Using A Classification Algorithm. Jurnal Dan Penelitian Teknik Informatika, 6(3). https://doi.org/10.33395/sinkron.v7i3

Ramadhani, B., & Suryono, R. R. (2024). Komparasi Algoritma Naïve Bayes dan Logistic Regression Untuk Analisis Sentimen Metaverse. JURNAL MEDIA INFORMATIKA BUDIDARMA, 8(2), 714. https://doi.org/10.30865/mib.v8i2.7458

Syah, F., Fajrin, H., Afif, A. N., Saeputra, M. R., Mirranty, D., & Saputra, D. D. (2023). Analisa Sentimen Terhadap Twitter IndihomeCare Menggunakan Perbandingan Algoritma Smote, Support Vector Machine, AdaBoost dan Particle Swarm Optimization. Jurnal JTIK (Jurnal Teknologi Informasi Dan Komunikasi), 7(1), 53–58. https://doi.org/10.35870/jtik.v7i1.686

Downloads

Published

2025-01-31

How to Cite

Badriyah, B., Chamidy, T. ., & Suhartono, S. (2025). Application of SMOTE in Sentiment Analysis of MyXL User Reviews on Google Play Store. JISKA (Jurnal Informatika Sunan Kalijaga), 10(1), 74–86. https://doi.org/10.14421/jiska.2025.10.1.74-86