Application of SMOTE in Sentiment Analysis of MyXL User Reviews on Google Play Store
DOI:
https://doi.org/10.14421/jiska.2025.10.1.74-86Keywords:
Sentiment Analysis, Logistic Regression, Support Vector Machine, GridSearchCV, SMOTEAbstract
Texts that express customer opinions about a product are important input for companies. Companies obtain valuable information from consumer perceptions of marketed products by conducting sentiment analysis. However, real-world text datasets are often unbalanced, causing the prediction results of classification algorithms to be biased towards the majority class and ignore the minority class. This study analyzes the sentiment of MyXL user reviews on the Google Play Store by comparing the performance of the Logistic Regression and Support Vector Machine algorithms in the SMOTE implementation. This analysis uses TF-IDF to extract feature and GridSearchCV to optimize the accuracy, precision, recall, and F1 score evaluation metrics. This study follows several scenarios of dividing training data and test data. SVM implementing SMOTE is the algorithm with the best performance using the division of training data (90%) and test data (10%), resulting in accuracy (73.00%), precision (67.13%), recall (65.82%) and F1 score (66.30%).
References
Audiansyah, D. D. (2022, July 5). Data Ulasan Terlabel. Kaggle. https://www.kaggle.com/datasets/dimasdiandraa/data-ulasan-terlabel?select=Ulasan+My+XL+1000+Data+Labelled.csv
Atmanegara, E., & Purwa, T. (2021). Hybrid Support Vector Machine and Logistic Regression for Multiclass Classification: A Case Study on Wine Dataset. Indonesian Journal of Data Science, 1(1), 1–7. https://www.researchgate.net/publication/353211298
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
Cheng, M., & Mani, R. (2024, June 24). Voice of the Consumer Survey 2024: Asia Pacific. PWC Indonesia. https://www.pwc.com/id/en/pwc-publications/industries-publications/consumer-and-industrial-products-and-services/consumer-survey-2024-asia-pacific.html
Darwis, H., Wanaspati, N., & Anraeni, S. (2023). Support Vector Machine untuk Analisis Sentimen Masyarakat Terhadap Penggunaan Antibiotik di Indonesia. The Indonesian Journal of Computer Science, 12(4), 12. https://doi.org/10.33022/ijcs.v12i4.3320
Hasibuan, E., & Heriyanto, E. A. (2022). Analisis Sentimen pada Ulasan Aplikasi Amazon Shopping di Google Play Store Menggunakan Naive Bayes Classifier. Jurnal Teknik Dan Science, 1(3), 13–24. https://doi.org/10.56127/jts.v1i3.434
Febrianti, F. A. D. P., Hamami, F., & Fa’rifah, R. Y. (2023). Aspect-Based Sentiment Analysis Terhadap Ulasan Aplikasi Flip Menggunakan Pembobotan Term Frequency-Inverse Document Frequency (TF-IDF) Dengan Metode Klasifikasi K-Nearest Neighbors (K-NN). Jurnal Indonesia : Manajemen Informatika Dan Komunikasi, 4(3), 1858–1873. https://doi.org/10.35870/jimik.v4i3.429
Grandini, M., Bagli, E., & Visani, G. (2020). Metrics for Multi-Class Classification: an Overview. http://arxiv.org/abs/2008.05756
Haikal, M., Martanto, M., & Hayati, U. (2024). Analisis Sentimen Terhadap Penggunaan Aplikasi Game Online PUBG Mobile Menggunakan Algoritma Naive Bayes. JATI (Jurnal Mahasiswa Teknik Informatika), 7(6), 3275–3281. https://doi.org/10.36040/jati.v7i6.8174
Harahap, F. H., Sutarman, S., Darnius, O., & Sembiring, P. (2023). Klasifikasi Menggunakan Model Regresi Logistik Multinomial dan Regresi Logistik Multinomial Komponen Utama. IJM: Indonesian Journal of Multidisciplinary, 1(2), 632–642. https://journal.csspublishing.com/index.php/ijm/article/view/183
Huda, M. N., Fauzan, D. A., Pamungkas, M. R. S. P., Ratnadewi, N. S., & Vahendra, A. A. (2023). Optimalisasi Model Klasifikasi Sentimen Netizen Terhadap Merek Tas Luar Negeri. Jurnal KomtekInfo, 21–28. https://doi.org/10.35134/komtekinfo.v10i1.360
Khushi, M., Shaukat, K., Alam, T. M., Hameed, I. A., Uddin, S., Luo, S., Yang, X., & Reyes, M. C. (2021). A Comparative Performance Analysis of Data Resampling Methods on Imbalance Medical Data. IEEE Access, 9, 109960–109975. https://doi.org/10.1109/ACCESS.2021.3102399
Nishat, M. M., Faisal, F., Ratul, I. J., Al-Monsur, A., Ar-Rafi, A. M., Nasrullah, S. M., Reza, M. T., & Khan, M. R. H. (2022). A Comprehensive Investigation of the Performances of Different Machine Learning Classifiers with SMOTE-ENN Oversampling Technique and Hyperparameter Optimization for Imbalanced Heart Failure Dataset. Scientific Programming, 2022, 1–17. https://doi.org/10.1155/2022/3649406
Ovirianti, N. H., Zarlis, M., & Mawengkang, H. (2022). Support Vector Machine Using A Classification Algorithm. Jurnal Dan Penelitian Teknik Informatika, 6(3). https://doi.org/10.33395/sinkron.v7i3
Ramadhani, B., & Suryono, R. R. (2024). Komparasi Algoritma Naïve Bayes dan Logistic Regression Untuk Analisis Sentimen Metaverse. JURNAL MEDIA INFORMATIKA BUDIDARMA, 8(2), 714. https://doi.org/10.30865/mib.v8i2.7458
Syah, F., Fajrin, H., Afif, A. N., Saeputra, M. R., Mirranty, D., & Saputra, D. D. (2023). Analisa Sentimen Terhadap Twitter IndihomeCare Menggunakan Perbandingan Algoritma Smote, Support Vector Machine, AdaBoost dan Particle Swarm Optimization. Jurnal JTIK (Jurnal Teknologi Informasi Dan Komunikasi), 7(1), 53–58. https://doi.org/10.35870/jtik.v7i1.686
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Badriyah, Totok Chamidy, Suhartono

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms as stated in http://creativecommons.org/licenses/by-nc/4.0
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.