Perbandingan Algoritma Klasifikasi Sentimen Twitter Terhadap Insiden Kebocoran Data Tokopedia

Authors

DOI:

https://doi.org/10.14421/jiska.2021.6.2.120-129

Abstract

Public responses, posted on Twitter reacting to the Tokopedia data leak incident, were used as a data set to compare the performance of three different classifiers, trained using supervised learning modeling, to classify sentiment on the text. All tweets were classified into either positive, negative, or neutral classes. This study compares the performance of Random Forest, Support-Vector Machine, and Logistic Regression classifier. Data was scraped automatically and used to evaluate several models; the SVM-based model has the highest f1-score 0.503583. SVM is the best performing classifier.

Author Biographies

Nadhif Ikbar Wibowo, Institut Teknologi Sepuluh Nopember

Information System, Student

Tri Andika Maulana, Institut Teknologi Sepuluh Nopember

Information System, Student

Hamzah Muhammad, Institut Teknologi Sepuluh Nopember

Information System, Student

Nur Aini Rakhmawati, Institut Teknologi Sepuluh Nopember

Information System, Lecturer

References

Beel, J., Langer, S., & Gipp, B. (2017). TF-IDuF: A Novel Term-Weighting Sheme for User Modeling based on Users’ Personal Document Collections. Proceedings of the iConference 2017, 1–7.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

CNN Indonesia. (2020). Deretan Peristiwa Kebocoran Data Warga RI Sejak Awal 2020. CNN Indonesia. https://www.cnnindonesia.com/teknologi/20200623160834-185-516532/deretan-peristiwa-kebocoran-data-warga-ri-sejak-awal-2020

Deviyanto, A., & Wahyudi, M. D. R. (2018). PENERAPAN ANALISIS SENTIMEN PADA PENGGUNA TWITTER MENGGUNAKAN METODE K-NEAREST NEIGHBOR. JISKA (Jurnal Informatika Sunan Kalijaga), 3(1), 1. https://doi.org/10.14421/jiska.2018.31-01

Faradhillah, N. Y. A., Kusumawardani, R. P., Hafidz, I., Informasi, J. S., & Informasi, F. T. (2016). Eksperimen Sistem Klasifikasi Analisa Sentimen Twitter Pada Akun Resmi Pemerintah Kota Surabaya Berbasis Pembelajaran Mesin. Seminar Nasional Sistem Informasi Indonesia, 15–24.

Hasan, A., Moin, S., Karim, A., & Shamshirband, S. (2018). Machine Learning-Based Sentiment Analysis for Twitter Accounts. Mathematical and Computational Applications, 23(1), 11. https://doi.org/10.3390/mca23010011

Hoffman, J. I. E. (2019). Basic Biostatistics for Medical and Biomedical Practitioners. In Biostatistics for Medical and Biomedical Practitioners. Elsevier. https://doi.org/10.1016/C2018-0-02190-8

Lanham, M., & Bedinelli, R. (2015). Evaluating Stochastic Cost-Benefit Classification Measures for A Retailer’s Assortment Mix Decision.

Librianty, A. (2016, Maret). Data Jadi Incaran Utama Penjahat Cyber. Liputan6. https://www.liputan6.com/tekno/read/2466293/data-jadi-incaran-utama-penjahat-cyber

Maulana, T., Rakhmawati, N., Wibowo, N., & Muhammad, H. (2020). Data Set Sentimen Twit Terhadap Insiden Kebocoran Data Tokopedia (1.0). Zenodo. https://doi.org/10.5281/ZENODO.4430588

Silva, C., & Ribeiro, B. (2003). The importance of stop word removal on recall values in text categorization. Proceedings of the International Joint Conference on Neural Networks, 2003., 3, 1661–1666. https://doi.org/10.1109/IJCNN.2003.1223656

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002

Tang, J., Li, H., Cao, Y., & Tang, Z. (2005). Email data cleaning. Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD ’05, 489. https://doi.org/10.1145/1081870.1081926

Tharwat, A. (2020). Classification assessment methods. Applied Computing and Informatics, 17(1), 168–192. https://doi.org/10.1016/j.aci.2018.08.003

VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. In O’Reilly (1 ed.). O’Reilly Media.

Vardiansyah, D. (2008). Filsafat Ilmu Komunikasi Suatu Pengantar. Indeks.

Wibowo, N. (2020). Program Scrapper Twit Tanpa API dan Pemroses Data (1.0). Zenodo. https://doi.org/10.5281/zenodo.4231819

Zhang, L., Zhou, W., & Jiao, L. (2004). Wavelet Support Vector Machine. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 34(1), 34–39. https://doi.org/10.1109/TSMCB.2003.811113

Downloads

Published

2021-05-03

How to Cite

Wibowo, N. I., Maulana, T. A., Muhammad, H., & Rakhmawati, N. A. (2021). Perbandingan Algoritma Klasifikasi Sentimen Twitter Terhadap Insiden Kebocoran Data Tokopedia. JISKA (Jurnal Informatika Sunan Kalijaga), 6(2), 120–129. https://doi.org/10.14421/jiska.2021.6.2.120-129

Issue

Section

Articles