Optimizing Financial Risk Prediction for Loan Approval Decisions
DOI:
https://doi.org/10.14421/jiska.6071Keywords:
Financial Risk Prediction, Loan Approval, Feature Selection, Machine Learning, ClassificationAbstract
Accurate financial risk prediction is essential for effective loan approval decision-making, particularly in data-driven financial systems. This study investigates the influence of feature selection strategies on the performance of machine learning models for loan approval prediction using the publicly available Kaggle "Financial Risk for Loan Approval" synthetic dataset, which contains 20,000 applications. Experiments evaluated multiple feature selection paradigms, including filter-based, wrapper-based, embedded, and PCA-informed approaches across six classification models using stratified 10-fold cross-validation and imbalance-aware metrics. The results show that feature selection consistently improves predictive robustness and minority class recognition. Contrary to assumptions favoring complex models, Logistic Regression combined with Lasso regularization achieved the best overall predictive performance, yielding an ROC-AUC of 99.41% and an F1-score of 91.72%. Embedded feature selection methods provided the most favorable balance between accuracy and computational efficiency. These findings indicate that the effectiveness of feature selection depends heavily on its interaction with model complexity, providing empirical guidance for designing robust, interpretable financial risk prediction systems.
References
Abdi, H., & Williams, L. J. (2023). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 15(1), e1609. https://doi.org/10.1002/wics.1609
Aruleba, I., & Sun, Y. (2025). Enhanced credit risk prediction using deep learning and hybrid resampling techniques. Machine Learning with Applications, Article 100692. https://doi.org/10.1016/j.mlwa.2024.100692
Ayari, H., Guetari, P. R., & Kraïem, N. (2026). Machine learning powered financial credit scoring: A systematic literature review. Artificial Intelligence Review, 59, 13. https://doi.org/10.1007/s10462-025-11416-2
Branco, P., Torgo, L., & Ribeiro, R. P. (2022). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys, 54(2), Article 31. https://doi.org/10.1145/3439720
Bulut, C., & Arslan, E. (2024). Comparison of the impact of dimensionality reduction and data splitting on classification performance in credit risk assessment. Artificial Intelligence Review, 57, 252. https://doi.org/10.1007/s10462-024-10904-1
Chang, V., Sivakulasingam, S., Wong, S. T. W., Ganatra, M. A., & Luo, J. (2024). Credit risk prediction using machine learning and deep learning: A study on credit card customers. Risks, 12(11), 174. https://doi.org/10.3390/risks12110174
Chicco, D., & Jurman, G. (2023). The advantages of the Matthews correlation coefficient over F1 score and accuracy in binary classification evaluation. BMC Genomics, 24, 6. https://doi.org/10.1186/s12864-023-09150-3
Chicco, D., Tötsch, N., & Jurman, G. (2021). The Matthews correlation coefficient is more informative than F1 score in binary classification. BMC Genomics, 22, 486. https://doi.org/10.1186/s12864-021-07779-6
Dumitrescu, E. I., & Hurlin, C. (2021). Machine learning for credit risk modeling. Review of Finance, 25(3), 775–808. https://doi.org/10.1093/rof/rfaa036
Ileberi, E., Sun, Y., & Wang, Z. (2024). A machine learning-based credit risk prediction engine system using a stacked classifier and a filter-based feature selection method. Journal of Big Data, 11, 23. https://doi.org/10.1186/s40537-024-00882-0
Jemai, J., & Zarrad, A. (2023). Feature selection engineering for credit risk assessment in retail banking. Information, 14(3), 200. https://doi.org/10.3390/info14030200
Kaur, H., Pannu, H. S., & Malhi, A. K. (2023). A systematic review on imbalanced classification in financial risk prediction. Applied Sciences, 13(4), 2107. https://doi.org/10.3390/app13042107
Lessmann, S., Baesens, B., Seow, H. V., & Thomas, L. C. (2021). Benchmarking state-of-the-art classification algorithms for credit scoring. European Journal of Operational Research, 290(3), 682–699. https://doi.org/10.1016/j.ejor.2020.08.055
Molnar, C. (2022). Interpretable machine learning (2nd ed.). Open-access online book. https://christophm.github.io/interpretable-ml-book/
Powers, D. M. W. (2020). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies, 2(1), 37–63. https://doi.org/10.48550/arXiv.2010.16061
Sáez, J. A., Luengo, J., Stefanowski, J., & Herrera, F. (2020). SMOTE-IPF: Addressing noisy and borderline examples in imbalanced classification. Information Sciences, 512, 1430–1449. https://doi.org/10.1016/j.ins.2019.10.004
Yang, L., Zhang, Y., & Chen, X. (2022). Interpretable dimensionality reduction for financial data analysis. Journal of Computational Finance, 26(2), 45–67. https://doi.org/10.21314/JCF.2022.414
Muangthanang, C., Mungsing, S., & Chirawichitcha, N. (2024). Credit risk prediction model using feature engineering and machine learning techniques. International Scientific Journal of Engineering and Technology, 8(1), 19–26. https://doi.org/10.25126/isjet.202512499
Noriega, J. P., Rivera, L. A., & Herrera, J. A. (2023). Machine learning for credit risk prediction: A systematic literature review. Data, 8(11), 169. https://doi.org/10.3390/data8110169
Quan, J., & Sun, X. (2024). Credit risk assessment using the factorization machine model with feature interactions. Humanities and Social Sciences Communications, 11, 234. https://doi.org/10.1057/s41599-024-02700-7
Zoppelletto, L. (2025). Financial Risk for Loan Approval [Dataset]. Kaggle. https://www.kaggle.com/datasets/lorenzozoppelletto/financial-risk-for-loan-approval
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Khalifatur Rauf, Adi Cahyo Kuswijayanto, Ella Kristiantini Susan

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms as stated in http://creativecommons.org/licenses/by-nc/4.0
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.




