Abstract
The rapid growth of user-generated reviews on platforms like Spotify necessitates efficient analytical techniques to extract valuable insights. This study employs a Support Vector Machine algorithm, optimized using Forward Selection, Backwards Elimination, Optimized Selection, Bagging, and AdaBoost, to effectively classify user reviews. A dataset of approximately 10,000 Spotify reviews was compiled from diverse online sources, ensuring a representative sample. The analysis reveals sentiment patterns across positive, negative, and neutral categories, with positive reviews dominates the landscape. These patterns help highlight Spotify’s strengths while identifying areas for improvement. However, the SVM algorithm faces challenges in classifying minority classes, particularly negative sentiments, due to class imbalance. To address this, advanced optimization techniques are utilized to enhance classification precision and recall. Preprocessing steps, including data cleansing, tokenization, stemming, and stopword removal, refine the dataset, while TF-IDF converts text into numerical features for effective feature selection. The results show that the Optimized Selection method achieves the highest accuracy of 84.5%, outperforming other approaches. This research contributes significantly to developing balanced sentiment analysis models. Future studies may explore deep learning techniques to further improve classification accuracy and mitigate current limitations in data representation.
References
D.-A. Joshi and B. Patel, “Data Preprocessing: The techniques for preparing clean and quality data for data analytics process,” Orient. J. Comput. Sci. Technol., vol. 13, pp. 78–81, Jan. 2021, doi: 10.13005/ojcst13.0203.03.
B. Malley, D. Ramazzotti, and J. T. Wu, “Data Prp-processing BT - Secondary analysis of electronic health records,” M. I. T. C. Data, Ed., Cham: Springer International Publishing, 2016, pp. 115–141. doi: 10.1007/978-3-319-43742-2_12.
R. A. Sayany and F. Karachi, “Overview of Data Cleansing Process in Data Mining,” June, 2021, doi: 10.13140/RG.2.2.22323.76326.
M. Sathiyanarayanan, A. Junejo, and O. Fadahunsi, Visual Analysis of Predictive Policing to Improve Crime Investigation. 2019. doi: 10.1109/IC3I46837.2019.9055515.
M. R, “Natural language processing for analysing and extracting insights,” International J. Sci. Res. Eng. Manag., vol. 06, no. 06, pp. 1–4, 2022, doi: 10.55041/ijsrem14434.
S. Y. Senanayake, K. T. P. M. Kariyawasam, and P. S. Haddela, “Enhanced tokenizer for Sinhala Language,” in 2019 National Information Technology Conference (NITC), 2019, pp. 84–89. doi: 10.1109/NITC48475.2019.9114420.
C. G. S. Satwika and J. P. Pramod, “Exploring Word-Level representations in modern natural language processing,” International Journal for Research in Applied Science & Engineering Technology, vol. 12, no. X, pp. 21–28, October, 2024.
A. Jabbar, S. Iqbal, M. I. Tamimy, S. Hussain, and A. Akhunzada, “Empirical evaluation and study of text stemming algorithms,” Artif. Intell. Rev., vol. 53, no. 8, pp. 5559–5588, 2020, doi: 10.1007/s10462-020-09828-3.
I. Mazlin, I. M. Rawi, M. Z. Zakaria, and S. S. A. Rahman, “Comparative study of stemming strategy for Hadith text classification,” in 2021 3rd International Conference on Natural Language Processing (ICNLP), 2021, pp. 42–45. doi: 10.1109/ICNLP52887.2021.00013.
N. Pamungkas et al., “Comparison of stemming test results of Tala Algorithms with Nazief Adriani in abstract documents and national news,” Inf. J. Ilm. Bid. Teknol. Inf. dan Komun., vol. 8, no. 1 SE-Articles, pp. 33–41, Jan. 2023, doi: 10.25139/inform.v8i1.5569.
D. J. Ladani and N. P. Desai, “Stopword identification and removal techniques on TC and IR applications: A Survey,” in 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), 2020, pp. 466–472. doi: 10.1109/ICACCS48705.2020.9074166.
H. Fang, “An Ultrabroadband Photodetector Based on PMN-28PT Single Crystal,” 2020, pp. 49–73. doi: 10.1007/978-981-15-4312-8_3.
A. W. Pradana and M. Hayaty, “The effect of stemming and removal of Stopwords on the accuracy of sentiment analysis on Indonesian-language texts,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, no. 4 SE-, pp. 375–380, Oct. 2019, doi: 10.22219/kinetik.v4i4.912.
F. Alzami, E. D. Udayanti, D. P. Prabowo, and R. A. Megantara, “Document preprocessing with TF-IDF to improve the Polarity Classification Performance of unstructured sentiment analysis,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 5, no. 3 SE-, pp. 235–242, Aug. 2020, doi: 10.22219/kinetik.v5i3.1066.
P. H. Prastyo, I. Ardiyanto, and R. Hidayat, “Indonesian Sentiment Analysis: An experimental study of four Kernel Functions on SVM Algorithm with TF-IDF,” in 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI), 2020, pp. 1–6. doi: 10.1109/ICDABI51230.2020.9325685.
Z. Jin, X. Lai, and J. Cao, “Multi-label sentiment analysis Base on BERT with modified TF-IDF,” in 2020 IEEE International Symposium on Product Compliance Engineering-Asia (ISPCE-CN), 2020, pp. 1–6. doi: 10.1109/ISPCE-CN51288.2020.9321861.
I. N. Syamsuriana Basri, Fitrawahyudi, Khaerani, S. M. , Ernawati , Aryanti, S. Aisyah, and Irma Sakti, “Peningkatan kemampuan literasi digital di lingkungan pendidikan berbasis aplikasi Canva,” vol. 1, no. 2, pp. 96–103, 2023, doi: 10.37985/pmsdu.v1i2.65.
R. Medar, V. S. Rajpurohit, and B. Rashmi, “Impact of training and testing data splits on accuracy of Time Series Forecasting in machine learning,” in 2017 International Conference on Computing, Communication, Control and Automation (ICCUBEA), 2017, pp. 1–6. doi: 10.1109/ICCUBEA.2017.8463779.
H.- Harafani, “Forward Selection pada Support Vector Machine untuk mMemprediksi kanker payudara,” J. Infortech, vol. 1, no. 2, pp. 131–139, 2020, doi: 10.31294/infortech.v1i2.7398.
E. Purwaningsih, “Improving the performance of Support Vector Machine with Forward Selection for Prediction of chronic kidney disease,” JITK (Jurnal Ilmu Pengetah. dan Teknol. Komputer), vol. 8, no. 1, pp. 18–24, 2022, doi: 10.33480/jitk.v8i1.3327.
A. Bode, “Perbandingan metode prediksi Support Vector Machine dan Linear Regression menggunakan Backward Elimination pada produksi minyak kelapa,” Simtek J. Sist. Inf. dan Tek. Komput., vol. 4, no. 2, pp. 104–107, 2019, doi: 10.51876/simtek.v4i2.57.
F. Maulidina, Z. Rustam, S. Hartini, V. V. P. Wibowo, I. Wirasati, and W. Sadewo, “Feature optimization using Backward Elimination and Support Vector Machines (SVM) algorithm for diabetes classification,” J. Phys. Conf. Ser., vol. 1821, no. 1, 2021, doi: 10.1088/1742-6596/1821/1/012006.
N. Z. Dina, S. D. Ravana, and N. Idris, “An experimental study on Hybrid Feature Selection Techniques for Sentiment Classification,” in 2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), 2022, pp. 270–275. doi: 10.1109/SKIMA57145.2022.10029452.
Y. Gong, B. Liao, P. Wang, and Q. Zou, “DrugHybrid_BS: Using hybrid feature combined with Bagging-SVM to predict potentially druggable proteins,” Front. Pharmacol., vol. 12, no. November, pp. 1–12, 2021, doi: 10.3389/fphar.2021.771808.
S. Huang and J. Zhou, “An enhanced stability evaluation system for entry-type excavations: Utilizing a hybrid bagging-SVM model, GP and kriging techniques,” J. Rock Mech. Geotech. Eng., 2024, doi: https://doi.org/10.1016/j.jrmge.2024.05.024.
W. Fan, B. Xu, H. Li, G. Lu, and Z. Liu, “A novel surrogate model for channel geometry optimization of PEM fuel cell based on Bagging-SVM Ensemble Regression,” Int. J. Hydrogen Energy, vol. 47, no. 33, pp. 14971–14982, 2022, doi: https://doi.org/10.1016/j.ijhydene.2022.02.239.
X. Huang, Z. Li, Y. Jin, and W. Zhang, “Fair-AdaBoost: Extending AdaBoost method to achieve fair classification,” Expert Syst. Appl., vol. 202, p. 117240, 2022, doi: https://doi.org/10.1016/j.eswa.2022.117240.
W. Wang and D. Sun, “The improved AdaBoost algorithms for imbalanced data classification,” Inf. Sci. (Ny)., vol. 563, pp. 358–374, 2021, doi: https://doi.org/10.1016/j.ins.2021.03.042.
S. Wan, X. Li, Y. Yin, and J. Hong, “Milling chatter detection by multi-feature fusion and Adaboost-SVM,” Mech. Syst. Signal Process., vol. 156, p. 107671, 2021, doi: https://doi.org/10.1016/j.ymssp.2021.107671.
W. M. P.D. and Haryoko, “Optimization of parameter Support Vector Machine (SVM) using Genetic Algorithm to review Go-Jek’s services,” in 2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), 2019, pp. 301–304. doi: 10.1109/ICITISEE48480.2019.9003894.
Sharazita Dyah Anggita and Ferian Fauzi Abdulloh, “Optimasi Algoritma Support Vector Machine berbasis PSO dan seleksi fitur Information Gain pada analisis sentimen,” J. Appl. Comput. Sci. Technol., vol. 4, no. 1, pp. 52–57, 2023, doi: 10.52158/jacost.v4i1.524.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.