Abstract
This study proposes a hybrid feature selection approach that combines Pearson Correlation and Principal Component Analysis (PCA) to improve classification performance in opinion mining tasks. The rapid growth of e-commerce on social media platforms, such as TikTok, has generated a significant volume of user-generated reviews, which are valuable sources of consumer sentiment. However, the high dimensionality of textual data poses challenges in achieving accurate sentiment classification. To address this issue, the proposed method first applies Pearson Correlation to remove irrelevant features with weak correlation to sentiment labels, followed by PCA to reduce dimensionality. The dataset consists of user reviews from the TikTok Seller platform. Experiments using SVM, Naive Bayes, and Random Forest show that the hybrid approach achieves the highest accuracy of 86.2% (SVM and RF), improving over PCA-only by +0.9% and recovering 13.8% accuracy loss for Naive Bayes (from 72.0% to 83.1%). The results demonstrate that integrating correlation- and projection-based methods yields a more compact and effective feature set. This approach is especially suited for opinion mining in noisy, high-dimensional e-commerce data.
References
D. Ardiansyah, A. Saepudin, R. Aryanti, E. Fitriani, and Royadi, “Analisis Sentimen Review Pada Aplikasi Media Sosial Tiktok Menggunakan Algoritma K-Nn Dan Svm Berbasis Pso,” Jurnal Informatika Kaputama (JIK), vol. 7, no. 2, pp. 233–241, 2023, doi: 10.59697/jik.v7i2.148.
P. Verma, A. Dumka, A. Bhardwaj, and A. Ashok, “Product Review-Based Customer Sentiment Analysis Using an Ensemble of mRMR and Forest Optimization Algorithm (FOA),” International Journal of Applied Metaheuristic Computing, vol. 13, no. 1, pp. 1–21, 2022, doi: 10.4018/ijamc.2022010107.
M. R. Kurniawan, D. Erawati, H. Setiawan, and Harmain, “Digitalisasi: Strategi Komunikasi KPU Dalam Meningkatkan Partisipasi Gen Z Pada Pemilu 2024,” INNOVATIVE: Journal Of Social Science Research, vol. 3, no. 6, pp. 1375–1390, 2023.
Y. Deta Kirana and S. Al Faraby, “Sentiment Analysis of Beauty Product Reviews Using the K-Nearest Neighbor (KNN) and TF-IDF Methods with Chi-Square Feature Selection,” Open Access J Data Sci Appl, vol. 4, no. 1, pp. 31–042, 2021, doi: 10.34818/JDSA.2021.4.71.
M. A. Athallah and K. Kraugusteeliana, “Analisis Kualitas Website Telkomsel Menggunakan Metode Webqual 4.0 dan Importance Performance Analysis,” CogITo Smart Journal, vol. 8, no. 1, pp. 171–182, 2022, doi: 10.31154/cogito.v8i1.374.171-182.
J. P. van der Harst and S. Angelopoulos, “Less is more: Engagement with the content of social media influencers,” J Bus Res, vol. 181, no. May, p. 114746, 2024, doi: 10.1016/j.jbusres.2024.114746.
S. S. Abdulkhaliq and A. M. Darwesh, “Sentiment Analysis Using Hybrid Feature Selection Techniques,” UHD Journal of Science and Technology, vol. 4, no. 1, pp. 29–40, 2020, doi: 10.21928/uhdjst.v4n1y2020.pp29-40.
S. Sarina and A. M. Tanniewa, “Implementasi Algoritma Support Vector Learning Terhadap Analisis Sentimen Penggunaan Aplikasi Tiktok Shop Seller Center,” Prosiding SISFOTEK, vol. 7, no. 1, pp. 165–170, 2023.
N. T. Romadloni and W. Supriyanti, “Analisis Sentimen Penggunaan Teknologi Pada Pendidikan Anak Usia Dini,” Jurnal Ilmiah SINUS, vol. 21, no. 2, p. 101, 2023, doi: 10.30646/sinus.v21i2.759.
I. Aida Sapitri and M. Fikry, “Pengklasifikasian Sentimen Ulasan Aplikasi Whatsapp Pada Google Play Store Menggunakan Support Vector Machine,” Jurnal TEKINKOM, vol. 6, no. 1, pp. 1–7, 2023, doi: 10.37600/tekinkom.v6i1.773.
S. J and K. U, “Sentiment analysis of amazon user reviews using a hybrid approach,” Measurement: Sensors, vol. 27, no. May, p. 100790, 2023, doi: 10.1016/j.measen.2023.100790.
J. R. Jim, M. A. R. Talukder, P. Malakar, M. M. Kabir, K. Nur, and M. F. Mridha, “Recent advancements and challenges of NLP-based sentiment analysis: A state-of-the-art review,” Natural Language Processing Journal, vol. 6, no. November 2023, p. 100059, 2024, doi: 10.1016/j.nlp.2024.100059.
N. Rawindaran, A. Jayal, and E. Prakash, “Exploration of the Impact of Cybersecurity Awareness on Small and Medium Enterprises (SMEs) in Wales Using Intelligent Software to Combat Cybercrime,” Computers, vol. 11, no. 12, 2022, doi: 10.3390/computers11120174.
D. Leni, A. Dwiharzandis, R. Sumiati, H. Haris, and S. Afriyani, “Seleksi Fitur Berdasarkan Korelasi Pearson dalam Pemodelan Efisiensi Energi Bangunan,” Teknika Sains: Jurnal Ilmu Teknik, vol. 8, no. 2, pp. 103–115, 2023, doi: 10.24967/teksis.v8i2.2525.
N. T. Romadloni, “Uncovering Insights in Spotify User Reviews with Optimized Support Vector Machine ( SVM ),” vol. 14, no. 1, pp. 530–546, 2025, doi: 10.14421/ijid.2025.4903.
M. R, “Natural Language Processing For analysing and Extracting Insights,” Interantional Journal of Scientific Research in Engineering and Management, vol. 06, no. 06, pp. 1–4, 2022, doi: 10.55041/ijsrem14434.
Sharazita Dyah Anggita and Ferian Fauzi Abdulloh, “Optimasi Algoritma Support Vector Machine Berbasis PSO Dan Seleksi Fitur Information Gain Pada Analisis Sentimen,” Journal of Applied Computer Science and Technology, vol. 4, no. 1, pp. 52–57, 2023, doi: 10.52158/jacost.v4i1.524.
U. Nandagopal and S. Thirumalaivelu, “Classification of Malware with MIST and N-Gram Features Using Machine Learning,” International Journal of Intelligent Engineering and Systems, vol. 14, no. 2, pp. 323–333, 2021, doi: 10.22266/ijies2021.0430.29.
M. M. Dewi, “Optimasi Pearson Correlation untuk Sistem Rekomendasi menggunakan Algoritma Firefly,” Jurnal Informatika, vol. 9, no. 1, pp. 1–5, 2022, doi: 10.31294/inf.v9i1.10209.
Y. Gong, B. Liao, P. Wang, and Q. Zou, “DrugHybrid_BS: Using Hybrid Feature Combined With Bagging-SVM to Predict Potentially Druggable Proteins,” Front Pharmacol, vol. 12, no. November, pp. 1–12, 2021, doi: 10.3389/fphar.2021.771808.
A. Dharmawan, R. E. Masithoh, and H. Z. Amanah, “Development of PCA-MLP Model Based on Visible and Shortwave Near Infrared Spectroscopy for Authenticating Arabica Coffee Origins,” Foods, vol. 12, no. 11, 2023, doi: 10.3390/foods12112112.
S. S. and I. B. Budiyanto, “Analysis of Vocational School Development Based on Regional Potential Using Principal Component Analysis (PCA),” Innovation of Vocational Technology Education, vol. 16, no. 1, pp. 76–103, 2020, doi: 10.17509/invotec.v16i1.23515.
N. Hafidz and D. Yanti Liliana, “Klasifikasi Sentimen pada Twitter Terhadap WHO Terkait Covid-19 Menggunakan SVM, N-Gram, PSO,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 2, pp. 213–219, 2021, doi: 10.29207/resti.v5i2.2960.
M. V. Naik, D. Vasumathi, and A. P. S. Kumar, “An Improved Intelligent Approach to Enhance the Sentiment Classifier for Knowledge Discovery Using Machine Learning,” 2020. doi: http://dx.doi.org/10.2174/2210327910999200528114552.
É. T. Morais, G. A. Barberes, I. V. A. F. Souza, F. G. Leal, J. V. P. Guzzo, and A. L. D. Spigolon, “Pearson Correlation Coefficient Applied to Petroleum System Characterization: The Case Study of Potiguar and Reconcavo Basins, Brazil,” 2023. doi: 10.3390/geosciences13090282.
X. Cheng, “A Comprehensive Study of Feature Selection Techniques in Machine Learning Models,” Insights in Computer, Signals and Systems, vol. 1, no. 1, pp. 65–78, 2024, doi: 10.70088/xpf2b276.
J. Li, Y. Huang, Y. Lu, L. Wang, Y. Ren, and R. Chen, “Sentiment Analysis Using E-Commerce Review Keyword-Generated Image with a HybridMachine Learning-BasedModel,” Computers, Materials and Continua, vol. 80, no. 1, pp. 1581–1599, 2024, doi: 10.32604/cmc.2024.052666.
A. Razzaque and D. A. Badholia, “PCA based feature extraction and MPSO based feature selection for gene expression microarray medical data classification,” Measurement: Sensors, vol. 31, no. May 2023, p. 100945, 2024, doi: 10.1016/j.measen.2023.100945.
N. T. Romadloni, N. D. Septiyanti, C. H. Pratomo, W. Kurniawan, and R. A. K. N. Bintang, “Classification of Sms Spam With N-Gram and Pearson Correlation Based Using Machine Learning Techniques,” SENTRI: Jurnal Riset Ilmiah, vol. 3, no. 2, pp. 967–977, 2024, doi: 10.55681/sentri.v3i2.2252.
D. Pajri, Y. Umaidah, and T. N. Padilah, “K-Nearest Neighbor Berbasis Particle Swarm Optimization untuk Analisis Sentimen Terhadap Tokopedia,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 6, no. 2, pp. 242–253, 2020, doi: 10.28932/jutisi.v6i2.2658.
I. M. Nasir et al., “Pearson correlation-based feature selection for document classification using balanced training,” Sensors (Switzerland), vol. 20, no. 23, pp. 1–18, 2020, doi: 10.3390/s20236793.
D. Risqiwati, A. D. Wibawa, E. S. Pane, W. R. Islamiyah, A. E. Tyas, and M. H. Purnomo, “Feature Selection for EEG-Based Fatigue Analysis Using Pearson Correlation,” Proceedings - 2020 International Seminar on Intelligent Technology and Its Application: Humanification of Reliable Intelligent Systems, ISITIA 2020, pp. 164–169, 2020, doi: 10.1109/ISITIA49792.2020.9163760.
O. F. Nzeakor, B. N. Nwokeoma, I. Hassan, B. O. Ajah, and J. T. Okpa, “Emerging Trends in Cybercrime Awareness in Nigeria,” International Journal of Cybersecurity Intelligence & Cybercrime, vol. 5, no. 3, pp. 41–67, 2022, doi: 10.52306/2578-3289.1098.
P. Chen, F. Li, and C. Wu, “Research on Intrusion Detection Method Based on Pearson Correlation Coefficient Feature Selection Algorithm,” J Phys Conf Ser, vol. 1757, no. 1, 2021, doi: 10.1088/1742-6596/1757/1/012054.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.