Implementasi Deep Learning untuk Entity Matching pada Dataset Obat (Studi Kasus K24 dan Farmaku)

Rivanda Putra Pratama; Rahmat Hidayat; Nisrina Fadhilah Fano; Adam Akbar; Nur Aini Rakhmawati

doi:10.14421/jiska.2021.6.3.130-138

Authors

Rivanda Putra Pratama Departemen Sistem Informasi, Institut Teknologi Sepuluh Nopember
Rahmat Hidayat Departemen Sistem Informasi, Institut Teknologi Sepuluh Nopember
Nisrina Fadhilah Fano Departemen Sistem Informasi, Institut Teknologi Sepuluh Nopember
Adam Akbar Departemen Sistem Informasi, Institut Teknologi Sepuluh Nopember
Nur Aini Rakhmawati Departemen Sistem Informasi, Institut Teknologi Sepuluh Nopember

DOI:

https://doi.org/10.14421/jiska.2021.6.3.130-138

Keywords:

Entity Matching, Deep Learning, DeepMatcher, Dataset, Hybrid

Abstract

Data processing speed in companies is important to speed up their analysis. Entity matching is a computational process that companies can perform in data processing. In conducting data processing, entity matching plays a role in determining two different data but referring to the same entity. Entity matching problems arise when the dataset used in the comparison is large. The deep learning concept is one of the solutions in dealing with entity matching problems. DeepMatcher is a python package based on a deep learning model architecture that can solve entity matching problems. The purpose of this study was to determine the matching between the two datasets with the application of DeepMatcher in entity matching using drug data from farmaku.com and k24klik.com. The comparison model used is the Hybrid model. Based on the test results, the Hybrid model produces accurate numbers, so that the entity matching used in this study runs well. The best accuracy value of the 10th training with an F1 value of 30.30, a precision value of 17.86, and a recall value of 100.

References

Abdullah, S. M. S. A., Ameen, S. Y. A., M. Sadeeq, M. A., & Zeebaree, S. (2021). Multimodal Emotion Recognition using Deep Learning. Journal of Applied Science and Technology Trends, 2(02), 52–58. https://doi.org/10.38094/jastt20291

Akbar, A., Fano, N. F., Pratama, R. P., Hidayat, R., & Rakhmawati, N. A. (2021). Dataset Obat Untuk Penelitian Entity Matching. https://doi.org/10.5281/ZENODO.4445466

Arsi, P., Wahyudi, R., & Waluyo, R. (2021). Optimasi SVM Berbasis PSO pada Analisis Sentimen Wacana Pindah Ibu Kota Indonesia. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(2), 231–237. https://doi.org/10.29207/resti.v5i2.2698

Chen, C., Golshan, B., Halevy, A., Tan, W., & Doan, A. (2018). BigGorilla: An Open-Source Ecosystem for Data Preparation and Integration. IEEE Data Eng. Bull., 41, 10–22.

Christophides, V., Efthymiou, V., & Stefanidis, K. (2015). Entity Resolution in the Web of Data. Synthesis Lectures on the Semantic Web: Theory and Technology, 5(3), 1–122. https://doi.org/10.2200/S00655ED1V01Y201507WBE013

Fu, C., Han, X., Sun, L., Chen, B., Zhang, W., Wu, S., & Kong, H. (2019). End-to-End Multi-Perspective Matching for Entity Resolution. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 4961–4967. https://doi.org/10.24963/ijcai.2019/689

Garreta, R., & Moncecchi, G. (2013). Learning Scikit-Learn: Machine Learning in Python. Packt Publishing Ltd.

Hardi, W. (2006). Mengukur kinerja search engine : sebuah eksperimentasi penilaian precision and recall untuk informasi ilmiah bidang ilmu perpustakaan dan informasi [Search Engines performance evaluation: an experimental the value of precision and recall for scientific information in LIS field.]. In Visi Pustaka [National Library of Indonesia]. Perpustakaan Nasional RI [National Library of Indonesia].

Hidayat, R., Pratama, R. P., & Rakhmawati, N. A. (2021). ANALISIS ENTITY MATCHING PADA DATASET SMARTPHONE MENGGUNAKAN METODE SIF, RNN, ATTENTION, DAN HYBRID. TEKNOSAINS: MEDIA INFORMASI SAINS DAN TEKNOLOGI, 15(1), 67–77. https://doi.org/10.24252/teknosains.v15i1.17583

Kasai, J., Qian, K., Gurajada, S., Li, Y., & Popa, L. (2019). Low-resource Deep Entity Resolution with Transfer and Active Learning. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 5851–5861.

Li, Y., Li, J., Suhara, Y., Doan, A., & Tan, W.-C. (2020). Deep entity matching with pre-trained language models. Proceedings of the VLDB Endowment, 14(1), 50–60. https://doi.org/10.14778/3421424.3421431

Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G., Deep, R., Arcaute, E., & Raghavendra, V. (2018). Deep Learning for Entity Matching. Proceedings of the 2018 International Conference on Management of Data, 19–34. https://doi.org/10.1145/3183713.3196926

Powers, D. M. W. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. International Journal of Machine Learning Technology, 2(1), 37–63.

Rule, A., Tabard, A., & Hollan, J. D. (2018). Exploration and Explanation in Computational Notebooks. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 1–12. https://doi.org/10.1145/3173574.3173606

Thirumuruganathan, S., Tang, N., Ouzzani, M., & Doan, A. (2020). Data Curation with Deep Learning. Proceedings of the 23rd International Conference on Extending Database Technology (EDBT), 277–286. https://doi.org/https://dx.doi.org/10.5441/002/edbt.2020.25

Yuan, Q., Shen, H., Li, T., Li, Z., Li, S., Jiang, Y., Xu, H., Tan, W., Yang, Q., Wang, J., Gao, J., & Zhang, L. (2020). Deep learning in environmental remote sensing: Achievements and challenges. Remote Sensing of Environment, 241, 111716. https://doi.org/10.1016/j.rse.2020.111716

Zhao, C., & He, Y. (2019). Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer Learning. The World Wide Web Conference on - WWW ’19, 2413–2424. https://doi.org/10.1145/3308558.3313578

Implementasi Deep Learning untuk Entity Matching pada Dataset Obat (Studi Kasus K24 dan Farmaku)

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

Indexed by

Information

Statistic