Perbandingan Algoritma Klasifikasi Sentimen Twitter Terhadap Insiden Kebocoran Data Tokopedia
DOI:
https://doi.org/10.14421/jiska.2021.6.2.120-129Abstract
Public responses, posted on Twitter reacting to the Tokopedia data leak incident, were used as a data set to compare the performance of three different classifiers, trained using supervised learning modeling, to classify sentiment on the text. All tweets were classified into either positive, negative, or neutral classes. This study compares the performance of Random Forest, Support-Vector Machine, and Logistic Regression classifier. Data was scraped automatically and used to evaluate several models; the SVM-based model has the highest f1-score 0.503583. SVM is the best performing classifier.
References
Beel, J., Langer, S., & Gipp, B. (2017). TF-IDuF: A Novel Term-Weighting Sheme for User Modeling based on Users’ Personal Document Collections. Proceedings of the iConference 2017, 1–7.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
CNN Indonesia. (2020). Deretan Peristiwa Kebocoran Data Warga RI Sejak Awal 2020. CNN Indonesia. https://www.cnnindonesia.com/teknologi/20200623160834-185-516532/deretan-peristiwa-kebocoran-data-warga-ri-sejak-awal-2020
Deviyanto, A., & Wahyudi, M. D. R. (2018). PENERAPAN ANALISIS SENTIMEN PADA PENGGUNA TWITTER MENGGUNAKAN METODE K-NEAREST NEIGHBOR. JISKA (Jurnal Informatika Sunan Kalijaga), 3(1), 1. https://doi.org/10.14421/jiska.2018.31-01
Faradhillah, N. Y. A., Kusumawardani, R. P., Hafidz, I., Informasi, J. S., & Informasi, F. T. (2016). Eksperimen Sistem Klasifikasi Analisa Sentimen Twitter Pada Akun Resmi Pemerintah Kota Surabaya Berbasis Pembelajaran Mesin. Seminar Nasional Sistem Informasi Indonesia, 15–24.
Hasan, A., Moin, S., Karim, A., & Shamshirband, S. (2018). Machine Learning-Based Sentiment Analysis for Twitter Accounts. Mathematical and Computational Applications, 23(1), 11. https://doi.org/10.3390/mca23010011
Hoffman, J. I. E. (2019). Basic Biostatistics for Medical and Biomedical Practitioners. In Biostatistics for Medical and Biomedical Practitioners. Elsevier. https://doi.org/10.1016/C2018-0-02190-8
Lanham, M., & Bedinelli, R. (2015). Evaluating Stochastic Cost-Benefit Classification Measures for A Retailer’s Assortment Mix Decision.
Librianty, A. (2016, Maret). Data Jadi Incaran Utama Penjahat Cyber. Liputan6. https://www.liputan6.com/tekno/read/2466293/data-jadi-incaran-utama-penjahat-cyber
Maulana, T., Rakhmawati, N., Wibowo, N., & Muhammad, H. (2020). Data Set Sentimen Twit Terhadap Insiden Kebocoran Data Tokopedia (1.0). Zenodo. https://doi.org/10.5281/ZENODO.4430588
Silva, C., & Ribeiro, B. (2003). The importance of stop word removal on recall values in text categorization. Proceedings of the International Joint Conference on Neural Networks, 2003., 3, 1661–1666. https://doi.org/10.1109/IJCNN.2003.1223656
Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002
Tang, J., Li, H., Cao, Y., & Tang, Z. (2005). Email data cleaning. Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD ’05, 489. https://doi.org/10.1145/1081870.1081926
Tharwat, A. (2020). Classification assessment methods. Applied Computing and Informatics, 17(1), 168–192. https://doi.org/10.1016/j.aci.2018.08.003
VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. In O’Reilly (1 ed.). O’Reilly Media.
Vardiansyah, D. (2008). Filsafat Ilmu Komunikasi Suatu Pengantar. Indeks.
Wibowo, N. (2020). Program Scrapper Twit Tanpa API dan Pemroses Data (1.0). Zenodo. https://doi.org/10.5281/zenodo.4231819
Zhang, L., Zhou, W., & Jiao, L. (2004). Wavelet Support Vector Machine. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 34(1), 34–39. https://doi.org/10.1109/TSMCB.2003.811113
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Nadhif Ikbar Wibowo, Tri Andika Maulana, Hamzah Muhammad, Nur Aini Rakhmawati
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms as stated in http://creativecommons.org/licenses/by-nc/4.0
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.