Comparative Analysis of Text Mining Classification Algorithms for English and Indonesian Qur’an Translation


Naïve Bayes
text classification

How to Cite

Hidayat, R., & Minati, S. (2019). Comparative Analysis of Text Mining Classification Algorithms for English and Indonesian Qur’an Translation. IJID (International Journal on Informatics for Development), 8(1), 47–51.


Qur'an, As-Sunnah, and Islamic old book have become the sources for Islam followers as sources of knowledge, wisdom, and law. But in daily life, there are still many Muslims who do not understand the meaning of the sentence in the Qur'an even though they read it every day. It becomes a challenge for Science and Engineering field academicians especially Informatics to explore and represent knowledge through intelligent system computing to answer various questions based on knowledge from the Qur'an. This research is creating an enabling computational environment for text mining the Qur'an, of which purpose is to facilitate people to understand each verse in the Qur'an. The classification experiment uses Support Vector Machine (SVM), Naive Bayes, k-Nearest Neighbor (kNN), and J48 Decision Tree classifier algorithms with Al-Baqarah verses translated to English and Indonesian as the dataset which was labeled by three most fundamental aspects of Islam: 'Iman' (faith), 'Ibadah' (worship), and 'Akhlaq' (virtues). Indonesian translation was processed by using the sastrawi package in Python to do the pre-processing and StringToWord Vector in WEKA with the TF-IDF method to implement the algorithms. The classification experiments are determined to measure accuracy, and f-measure, it tested with a percentage split 66% as the data training and the rest as the data testing. The decision from an experiment that was carried out by the classification results, SVM classifier algorithms have the overall best accuracy performance for the Indonesian translation of 81.443% and the Naïve Bayes classifier has the best accuracy for the English translation, which achieved 78.35%.


M. Osman, A. Hilal, and M. Alhawarat, “Fine-Grained Quran Dataset,” Int. J. Adv. Comput. Sci. Appl., vol. 6, no. 12, 2016.

V. Gupta and G. S. Lehal, “A survey of text mining techniques and applications,” J. Emerg. Technol. Web Intell., vol. 1, no. 1, pp. 60–76, 2009.

G. S. Hassan, S. K. Mohammad, and F. M. Alwan, “Categorization of ‘Holy Quran-Tafseer’ using K-Nearest Neighbor Algorithm,” Int. J. Comput. Appl., vol. 129, no. 12, pp. 1–6, 2015.

M. I. Rahman, N. A. Samsudin, A. Mustapha, and A. Abdullahi, “Comparative analysis for topic classification in Juz Al-Baqarah,” Indones. J. Electr. Eng. Comput. Sci., vol. 12, no. 1, pp. 406–411, 2018.

Mohammed N. Al-Kabi, Belal M. Abu Ata, Heider A. Wahsheh, and Izzat M. Alsmadi, “A Topical Classification of Quranic Arabic Text,” Proc. 2013 Taibah Univ. Int. Conf. Adv. Inf. Technol. Holy Quran Its Sci., no. December, pp. 272–277, 2013.

S. K. Hamed and M. J. A. Aziz, “A question answering system on Holy Quran translation based on question expansion technique and Neural Network classification,” J. Comput. Sci., vol. 12, no. 3, pp. 169–177, 2016.

C. Slamet, A. Rahman, M. A. Ramdhani, and W. Dharmalaksana, "Clustering the verses of the Holy Qur'an using K-means algorithm," Asian J. Inf. Technol., vol. 15, no. 24, pp. 5159–5162, 2016.

M. K. Siddiqui, S. Naahid, and M. N. I. Khan, “A REVIEW of QURANIC WEB PORTALS THROUGH DATA MINING,” VAWKUM Trans. Comput. Sci., vol. 5, no. 2, pp. 1–7, 2015.

A. Hilal and N. Srinivas, “Analytical of the Initial Holy Quran Letters Based on Data Mining Study,” Am. Int. J. Res. Formal, Appl. Nat. Sci., vol. 10, no. 1, pp. 1–8, 2015.

M. Akour, I. Alsmadi, and I. Alazzam, “MQVC: Measuring quranic verses similarity and sura classification using N-gram,” WSEAS Trans. Comput., vol. 13, pp. 485–491, 2014.

N. S. Jamil et al., “A subject identification method based on term frequency technique,” Int. J. Adv. Comput. Res., vol. 7, no. 30, pp. 103–110, 2017.

M. Alhawarat, “Extracting Topics from the Holy Quran Using Generative Models,” Int. J. Adv. Comput. Sci. Appl., vol. 6, no. 12, pp. 288–294, 2016.

M. N. Al-Kabi, H. A. Wahsheh, I. M. Alsmadi, and A. Moh’d Ali Al-Akhras, “Extended Topical Classification of Hadith Arabic Text,” Int. J. Islam. Appl. Comput. Sci. Technol., vol. 3, no. 3, pp. 13–23, 2015.

S. Vijayarani, J. Ilamathi, and Nithya, “Preprocessing Techniques for Text Mining - An Overview,” Int. J. Comput. Sci. Commun. Networks, vol. 5, no. 1, pp. 7–16, 2018.

F. Z. Tala, “A Study of Stemming Effect on Information Retrieval in Bahasa Indonesia,” 2003.

S. Amarappa and S. V Sathyanarayana, “Data Classification Using Support Vector Machine (SVM), a simplified approach,” Int. J. Electron. Comput. Sci. Eng., vol. 3, no. 4, pp. 435–445, 2019.

H. Motoda et al., Top 10 algorithms in data mining, vol. 14, no. 1. 2007.

Wikipedia Contributor, “C4.5 algorithm,” Wikipedia, The Free Encyclopedia, 2019. [Online]. Available:

A. O. Adeleke, N. A. Samsudin, A. Mustapha, and N. Nawi, “Comparative Analysis of Text Classification Algorithms for Automated Labelling of Quranic Verses,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 7, no. 4, p. 1419, 2017.

D. Kuhlman, A Python Book: Beginning Python, Advanced Python, and Python Exercises. Platypus Global Media, 2012.

Jubilee Digital, Pemrograman Python Untuk Pemula. Yogyakarta: Jubilee Solusi Enterprise, 2016.

I. H. Witten, E. Frank, and M. a Hall, Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. San Francisco: Cerra, Diane, 2011.

Creative Commons License
IJID (International Journal on Informatics for Development) is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License