A Machine Learning Framework for Improving Classification Performance on Credit Approval


Data Mining
Information Gain
Naïve Bayes

How to Cite

Prastyo, P. H., Prasetyo, S. E. ., & Arti, S. . (2021). A Machine Learning Framework for Improving Classification Performance on Credit Approval. IJID (International Journal on Informatics for Development), 10(1), 47–52. https://doi.org/10.14421/ijid.2021.2384


Credit scoring is a model commonly used in the decision-making process to refuse or accept loan requests. The credit score model depends on the type of loan or credit and is complemented by various credit factors. At present, there is no accurate model for determining which creditors are eligible for loans. Therefore, an accurate and automatic model is needed to make it easier for banks to determine appropriate creditors. To address the problem, we propose a new approach using the combination of a machine learning algorithm (Naïve Bayes), Information Gain (IG), and discretization in classifying creditors. This research work employed an experimental method using the Weka application. Australian Credit Approval data was used as a dataset, which contains 690 instances of data. In this study, Information Gain is employed as a feature selection to select relevant features so that the Naïve Bayes algorithm can work optimally. The confusion matrix is used as an evaluator and 10-fold cross-validation as a validator. Based on experimental results, our proposed method could improve the classification performance, which reached the highest performance in average accuracy, precision, recall, and f-measure with the value of 86.29%, 86.33%, 86.29%, 86.30%, and 91.52%, respectively. Besides, the proposed method also obtains 91.52% of the ROC area. It indicates that our proposed method can be classified as an excellent classification.



O. J. Okesola, K. O. Okokpujie, A. A. Adewale, S. N. John, and O. Omoruyi, “An Improved Bank Credit Scoring Model: A Naïve Bayesian Approach,” in Proceedings - 2017 International Conference on Computational Science and Computational Intelligence, CSCI 2017, 2017, pp. 228–233, doi: 10.1109/CSCI.2017.36.

Y. Abakarim, M. Lahby, and A. Attioui, “Towards An Efficient Real-time Approach to Loan Credit Approval Using Deep Learning,” in 9th International Symposium on Signal, Image, Video and Communications, ISIVC 2018 - Proceedings, 2018, pp. 306–313, doi: 10.1109/ISIVC.2018.8709173.

P. H. Prastyo, I. G. Paramartha, M. S. M. Pakpahan, and I. Ardiyanto, “Predicting Breast Cancer : A Comparative Analysis of Machine Learning Algorithms,” in Proceedings International Conference on Science and Engineering, 2020, pp. 455–459.

F. Harahap, A. Y. N. Harahap, E. Ekadiansyah, R. N. Sari, R. Adawiyah, and C. B. Harahap, “Implementation of Naïve Bayes Classification Method for Predicting Purchase,” in 2018 6th International Conference on Cyber and IT Service Management, CITSM 2018, 2018, pp. 5–9, doi: 10.1109/CITSM.2018.8674324.

F. Burdi, A. H. Setianingrum, and N. Hakiem, “Application of the naive bayes method to a decision support system to provide discounts (Case study: PT. Bina Usaha Teknik),” in Proceedings - 6th International Conference on Information and Communication Technology for the Muslim World, ICT4M 2016, 2016, pp. 281–285, doi: 10.1109/ICT4M.2016.57.

A. Tripathi, S. Yadav, and R. Rajan, “Naive Bayes Classification Model for the Student Performance Prediction,” in 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies, ICICICT 2019, 2019, pp. 1548–1553, doi: 10.1109/ICICICT46008.2019.8993237.

I. O. Eweoya, A. A. Adebiyi, A. A. Azeta, F. Chidozie, F. O. Agono, and B. Guembe, “A Naive Bayes approach to fraud prediction in loan default,” J. Phys. Conf. Ser., vol. 1299, no. 1, p. 4, 2019, doi: 10.1088/1742-6596/1299/1/012038.

S. Vimala and K. C. Sharmili, “Prediction of Loan Risk using Naive Bayes and Support Vector Machine,” in International Conference on Advancements in Computing Technologies, 2018, pp. 110–113.

R. S. Raj, D. S. Sanjay, M. Kusuma, and S. Sampath, “Comparison of Support Vector Machine and Naïve Bayes Classifiers for Predicting Diabetes,” in 1st International Conference on Advanced Technologies in Intelligent Control, Environment, Computing and Communication Engineering, ICATIECE 2019, 2019, pp. 41–45, doi: 10.1109/ICATIECE45860.2019.9063792.

U. Pujianto, E. N. Azizah, and A. S. Damayanti, “Naive Bayes using to predict students’ academic performance at faculty of literature,” in 5th International Conference on Electrical, Electronics and Information Engineering: Smart Innovations for Bridging Future Technologies, ICEEIE 2017, 2017, pp. 163–169, doi: 10.1109/ICEEIE.2017.8328782.

J. Ding and L. Fu, “A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search,” J. Intell. Comput., vol. 9, no. 3, pp. 93–101, 2018, doi: 10.6025/jic/2018/9/3/93-101.

D. Zeng, J. Peng, S. Fong, Y. Qiu, and R. Wong, “Medical data mining in sentiment analysis based on optimized swarm search feature selection,” Australas. Phys. Eng. Sci. Med., vol. 41, no. 4, pp. 1087–1100, 2018, doi: 10.1007/s13246-018-0674-3.

N. Gopika and A. E. A. Meena Kowshalaya, “Correlation Based Feature Selection Algorithm for Machine Learning,” in Proceedings of the 3rd International Conference on Communication and Electronics Systems, ICCES 2018, 2018, pp. 692–695, doi: 10.1109/CESYS.2018.8723980.

M. A. Thanoon, M. J. M. Zedan, and A. N. Hameed, “Feature Selection Based on Wrapper and Information Gain,” in NICST 2019 - 1st Al-Noor International Conference for Science and Technology, 2019, pp. 32–37, doi: 10.1109/NICST49484.2019.9043805.

S. Widya Sihwi, I. Prasetya Jati, and R. Anggrainingsih, “Twitter Sentiment Analysis of Movie Reviews Using Information Gain and Naïve Bayes Classifier,” in Proceedings - 2018 International Seminar on Application for Technology of Information and Communication: Creative Technology for Human Life, iSemantic 2018, 2018, pp. 190–195, doi: 10.1109/ISEMANTIC.2018.8549757.

Mihuandayani, E. Utami, and E. T. Luthfi, “Text mining based on tax comments as big data analysis using SVM and feature selection,” in 2018 International Conference on Information and Communications Technology, ICOIACT 2018, 2018, pp. 537–542, doi: 10.1109/ICOIACT.2018.8350743.

P. Chauhan and A. Swami, “Breast Cancer Prediction Using Genetic Algorithm Based Ensemble Approach,” 2018 9th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2018, pp. 1–8, 2018, doi: 10.1109/ICCCNT.2018.8493927.

B. Santosa and A. Umam, Data Mining dan Big Data Analytics, 2nd ed. Penebar Media Pustaka, 2018.

Y. Al Amrani, M. Lazaar, and K. E. El Kadiri, “Random forest and support vector machine based hybrid approach to sentiment analysis,” in Procedia Computer Science, 2018, vol. 127, pp. 511–520, doi: 10.1016/j.procs.2018.01.150.

I. Kurniawati and H. F. Pardede, “Hybrid Method of Information Gain and Particle Swarm Optimization for Selection of Features of SVM-Based Sentiment Analysis,” 2018 Int. Conf. Inf. Technol. Syst. Innov. ICITSI 2018 - Proc., pp. 1–5, 2019, doi: 10.1109/ICITSI.2018.8695953.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Copyright (c) 2021 IJID (International Journal on Informatics for Development)