Implementasi Deep Learning untuk Entity Matching pada Dataset Obat (Studi Kasus K24 dan Farmaku)
DOI:
https://doi.org/10.14421/jiska.2021.6.3.130-138Keywords:
Entity Matching, Deep Learning, DeepMatcher, Dataset, HybridAbstract
Data processing speed in companies is important to speed up their analysis. Entity matching is a computational process that companies can perform in data processing. In conducting data processing, entity matching plays a role in determining two different data but referring to the same entity. Entity matching problems arise when the dataset used in the comparison is large. The deep learning concept is one of the solutions in dealing with entity matching problems. DeepMatcher is a python package based on a deep learning model architecture that can solve entity matching problems. The purpose of this study was to determine the matching between the two datasets with the application of DeepMatcher in entity matching using drug data from farmaku.com and k24klik.com. The comparison model used is the Hybrid model. Based on the test results, the Hybrid model produces accurate numbers, so that the entity matching used in this study runs well. The best accuracy value of the 10th training with an F1 value of 30.30, a precision value of 17.86, and a recall value of 100.
References
Abdullah, S. M. S. A., Ameen, S. Y. A., M. Sadeeq, M. A., & Zeebaree, S. (2021). Multimodal Emotion Recognition using Deep Learning. Journal of Applied Science and Technology Trends, 2(02), 52–58. https://doi.org/10.38094/jastt20291
Akbar, A., Fano, N. F., Pratama, R. P., Hidayat, R., & Rakhmawati, N. A. (2021). Dataset Obat Untuk Penelitian Entity Matching. https://doi.org/10.5281/ZENODO.4445466
Arsi, P., Wahyudi, R., & Waluyo, R. (2021). Optimasi SVM Berbasis PSO pada Analisis Sentimen Wacana Pindah Ibu Kota Indonesia. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(2), 231–237. https://doi.org/10.29207/resti.v5i2.2698
Chen, C., Golshan, B., Halevy, A., Tan, W., & Doan, A. (2018). BigGorilla: An Open-Source Ecosystem for Data Preparation and Integration. IEEE Data Eng. Bull., 41, 10–22.
Christophides, V., Efthymiou, V., & Stefanidis, K. (2015). Entity Resolution in the Web of Data. Synthesis Lectures on the Semantic Web: Theory and Technology, 5(3), 1–122. https://doi.org/10.2200/S00655ED1V01Y201507WBE013
Fu, C., Han, X., Sun, L., Chen, B., Zhang, W., Wu, S., & Kong, H. (2019). End-to-End Multi-Perspective Matching for Entity Resolution. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 4961–4967. https://doi.org/10.24963/ijcai.2019/689
Garreta, R., & Moncecchi, G. (2013). Learning Scikit-Learn: Machine Learning in Python. Packt Publishing Ltd.
Hardi, W. (2006). Mengukur kinerja search engine : sebuah eksperimentasi penilaian precision and recall untuk informasi ilmiah bidang ilmu perpustakaan dan informasi [Search Engines performance evaluation: an experimental the value of precision and recall for scientific information in LIS field.]. In Visi Pustaka [National Library of Indonesia]. Perpustakaan Nasional RI [National Library of Indonesia].
Hidayat, R., Pratama, R. P., & Rakhmawati, N. A. (2021). ANALISIS ENTITY MATCHING PADA DATASET SMARTPHONE MENGGUNAKAN METODE SIF, RNN, ATTENTION, DAN HYBRID. TEKNOSAINS: MEDIA INFORMASI SAINS DAN TEKNOLOGI, 15(1), 67–77. https://doi.org/10.24252/teknosains.v15i1.17583
Kasai, J., Qian, K., Gurajada, S., Li, Y., & Popa, L. (2019). Low-resource Deep Entity Resolution with Transfer and Active Learning. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 5851–5861.
Li, Y., Li, J., Suhara, Y., Doan, A., & Tan, W.-C. (2020). Deep entity matching with pre-trained language models. Proceedings of the VLDB Endowment, 14(1), 50–60. https://doi.org/10.14778/3421424.3421431
Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G., Deep, R., Arcaute, E., & Raghavendra, V. (2018). Deep Learning for Entity Matching. Proceedings of the 2018 International Conference on Management of Data, 19–34. https://doi.org/10.1145/3183713.3196926
Powers, D. M. W. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. International Journal of Machine Learning Technology, 2(1), 37–63.
Rule, A., Tabard, A., & Hollan, J. D. (2018). Exploration and Explanation in Computational Notebooks. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 1–12. https://doi.org/10.1145/3173574.3173606
Thirumuruganathan, S., Tang, N., Ouzzani, M., & Doan, A. (2020). Data Curation with Deep Learning. Proceedings of the 23rd International Conference on Extending Database Technology (EDBT), 277–286. https://doi.org/https://dx.doi.org/10.5441/002/edbt.2020.25
Yuan, Q., Shen, H., Li, T., Li, Z., Li, S., Jiang, Y., Xu, H., Tan, W., Yang, Q., Wang, J., Gao, J., & Zhang, L. (2020). Deep learning in environmental remote sensing: Achievements and challenges. Remote Sensing of Environment, 241, 111716. https://doi.org/10.1016/j.rse.2020.111716
Zhao, C., & He, Y. (2019). Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer Learning. The World Wide Web Conference on - WWW ’19, 2413–2424. https://doi.org/10.1145/3308558.3313578
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Rivanda Putra Pratama, Rahmat Hidayat, Nisrina Fadhilah Fano, Adam Akbar, Nur Aini Rakhmawati
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms as stated in http://creativecommons.org/licenses/by-nc/4.0
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.