An Efficient Journal Articles Searching using Vector Space Model Algorithm
PDF

Keywords

digital library
scraping
tf-idf
vector space model
searching

How to Cite

Alvriyanto, A., Nuruzzaman, M. T., Siregar, M. U., & Hidayat, R. (2020). An Efficient Journal Articles Searching using Vector Space Model Algorithm. IJID (International Journal on Informatics for Development), 9(1), 21–28. https://doi.org/10.14421/ijid.2020.09104

Abstract

One of the main feature of digital library is a search engine which depends on keywords submitted by a user. However, in the traditional algorithm, the computation performance, searching speed, significantly relies on the number of journal articles stored in the databases. Some irrelevant search results also increase the speed of article searching process. To solve the problem, in this paper we propose vector space model (VSM) algorithm to search for relevant journal articles. The VSM algorithm considers a term frequency - inversed document frequency (TF-IDF). The VSM algorithm will be compared to the baseline algorithm namely traditional algorithm. Both algorithms will be evaluated using combination of keywords which can be a synonym, phrase, error typography, or suffix and prefix. By using the data consist of 635 journal articles, both algorithms are compared in terms of 11 evaluation criteria. The results show that VSM algorithm is able to obtain the intended journal at 5th rank on average as compared to the traditional algorithm which can obtain the intended journal at rank of 171st on average. Therefore, our proposed algorithm can improve the performance to accurately sort the journal articles based on the submitted keywords as compared to traditional algorithm.   

https://doi.org/10.14421/ijid.2020.09104
PDF

References

N. Annisa, W. Nengsih, and Ananda, “Implementasi Algoritma Vector Space Model dalam Pencarian E-Book,” J. Aksara Komput. Terap., vol. 3, no. 2, pp. 1–7, 2014.

Bania Amburika, Y. H. Chrisnanto, and W. Uriawan, “Teknik Vector Space Model (VSM) dalam Penentuan Penanganan Dampak Game Online Pada Anak,” Pros. SNST ke-7 Tahun 2016, vol. 1, no. 1, pp. 10–27, 2016, doi: 10.1103/PhysRevC.6.1023.

Anna and A. Hendini, “Implementasi Vector Space Model Pada Sistem Pencarian Mesin Karaoke,” Evolusi J. Sains dan Manaj., vol. 6, no. 1, pp. 1–6, 2018, doi: 10.31294/evolusi.v6i1.3535.

F. Amin, “Sistem Temu Kembali Informasi dengan Pemeringkatan Metode Vector Space Model,” J. Teknol. Inf. Din., vol. 18, no. 2, pp. 122–129, 2017, doi: 10.22441/fifo.v9i1.1444.

C. M. Pasma, U. D. Rosiani, and R. Ariyanto, “Pengembangan Sistem Pendeteksi Kemiripan Karya Pada Inaicta 2013,” J. Inform. Polinema, vol. 1, no. 4, p. 14, 2015, doi: 10.33795/jip.v1i4.117.

M. Turland, php|architect’s Guide to Web Scraping with PHP. Toronto: Marco Tabini & Associates, Inc., 2010.

A. Josi, L. A. Abdillah, and Suryayusra, “Penerapan teknik web scraping pada mesin pencari artikel ilmiah,” ArXiv14105777 Cs, pp. 159–164, 2014.

“Web Site Scraper - The Most Effective Tool for Web Data Extraction,” The Computer Advisor. .

N. Juliasari and J. C. Sitompul, “Aplikasi Search Engine Dengan Metode Depth First Search (DFS),” J. Tek. Inform. Univ. Budi Luhur. ISSN 1693 -9166, vol. 9, no. 1, pp. 9–12, 2012, doi: 10.1109/20.312267.

S. Weiss, N. Indurkhya, T. Zhang, and F. Damerau, Text Mining: Predictive Methods for Analyzing Unstructured Information. 2004.

C. D. Manning, P. Raghavan, and H. Schütze, “Introduction to Modern Information Retrieval (2nd edition),” Cambridge Univ. Press, vol. 53, no. 9, pp. 462–463, 2009, doi: 10.1108/00242530410565256.

E. Dragut, F. Fang, A. Sistla, C. Yu, and W. Meng, “Stop Word and Related Problems in Web Interface Integration.,” PVLDB, vol. 2, pp. 349–360, Aug. 2009, doi: 10.14778/1687627.1687667.

L. Bradji and M. Boufaida, “A Rule Management System for Knowledge Based Data Cleaning,” Intell. Inf. Manag., vol. 3, pp. 230–239, Jan. 2011, doi: 10.4236/iim.2011.36028.

M. Adriani, J. Asian, B. Nazief, S. M. M. Tahaghoghi, and H. E. Williams, “Stemming Indonesian: A Confix-Stripping Approach,” ACM Trans. Asian Lang. Inf. Process., vol. 6, no. 4, pp. 1–33, Dec. 2008, doi: 10.1145/1316457.1316459.

K. Akromunnisa and R. Hidayat, “Klasifikasi Dokumen Tugas Akhir (Skripsi) Menggunakan K-Nearest Neighbor,” JISKA (Jurnal Inform. Sunan Kalijaga), vol. 4, no. 1, pp. 69–75, 2019.

M. A. Hall and L. A. Smith, “Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper,” in Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference, 1999, pp. 235–239.

R. Hidayat and S. Minati, “Comparative Analysis of Text Mining Classification Algorithms for English and Indonesian Qur’an Translation,” IJID (International J. Informatics Dev., vol. 8, no. 1, pp. 47–51, 2019.

A. Deviyanto and M. D. R. Wahyudi, “Penerapan Analisis Sentimen Pada Pengguna Twitter Menggunakan Metode K-Nearest Neighbor,” JISKA (Jurnal Inform. Sunan Kalijaga), vol. 3, no. 1, p. 1, 2018, doi: 10.14421/jiska.2018.31-01.

G. Salton, Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, 1989.

S. Tata and J. Patel, “Estimating the Selectivity of tf-idf based Cosine Similarity Predicates,” Sigmod Rec., vol. 36, Jun. 2007, doi: 10.1145/1361348.1361351.

M. Habibi and P. W. Cahyo, “Journal Classification Based on Abstract Using Cosine Similarity and Support Vector Machine,” JISKA (Jurnal Inform. Sunan Kalijaga), vol. 4, no. 3, pp. 48–55, 2020.

E. Garcia, “The Classic TF-IDF Vector Space Model,” 2006.

A. Chaer, Linguistik umum. Jakarta: Rineka Cipta, 1994.

Creative Commons License
IJID (International Journal on Informatics for Development) is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License