Prediksi Kualitas Udara Menggunakan Metode CatBoost

Authors

  • Mohamad Arif Abdul Syukur UIN Maulana Malik Ibrahim Malang
  • Suhartono Suhartono UIN Maulana Malik Ibrahim Malang
  • Totok Chamidy UIN Maulana Malik Ibrahim Malang

DOI:

https://doi.org/10.14421/jiska.2025.10.2.249-258

Keywords:

Prediction, Air Quality, Gradient Boosting, CatBoost, GridSearchCV

Abstract

Air is essential for life, but industrial activities, forest fires, cigarette smoke, and transportation contribute to air pollution. AirVisual AQI 2024 data ranks Jakarta in 11th place globally, with the highest level of pollution, reaching 127, which is unhealthy for sensitive groups and poses a risk of causing serious illnesses, including skin and respiratory diseases. This research uses the CatBoost method to predict the air quality index using Jakarta SPKU data taken from Kaggle. The data is processed through pre-processing and divided into four models with different comparisons of training and testing data. Each model was tested with the parameters iteration, depth, learning rate, and l2_leaf_reg, using GridSearchCV to find the optimal combination. The results show that the model with 90% training data and 10% testing data provides the best accuracy of 97%, due to the larger proportion of training data. This research demonstrates that the CatBoost method can yield accurate air quality predictions, which is crucial in supporting efforts to mitigate the impact of pollution and enhance public health.

References

Amalia, A., Zaidiah, A., & Isnainiyah, I. N. (2022). Prediksi Kualitas Udara Menggunakan Algoritma K-Nearest Neighbor. JIPI (Jurnal Ilmiah Penelitian dan Pembelajaran Informatika), 7(2), 496–507. https://doi.org/10.29100/jipi.v7i2.2843

Apte, J. S., Messier, K. P., Gani, S., Brauer, M., Kirchstetter, T. W., Lunden, M. M., Marshall, J. D., Portier, C. J., Vermeulen, R. C. H., & Hamburg, S. P. (2017). High-Resolution Air Pollution Mapping with Google Street View Cars: Exploiting Big Data. Environmental Science & Technology, 51(12), 6999–7008. https://doi.org/10.1021/acs.est.7b00891

Baharuddin, M. M., Azis, H., & Hasanuddin, T. (2019). Analisis Performa Metode K-Nearest Neighbor untuk Identifikasi Jenis Kaca. ILKOM Jurnal Ilmiah, 11(3), 269–274. https://doi.org/10.33096/ilkom.v11i3.489.269-274

Castelli, M., Clemente, F. M., Popovič, A., Silva, S., & Vanneschi, L. (2020). A Machine Learning Approach to Predict Air Quality in California. Complexity, 2020, 1–23. https://doi.org/10.1155/2020/8049504

Chandra, W., Resti, Y., & Suprihatin, B. (2022). Implementation of a Breakpoint Halfway Discretization to Predict Jakarta’s Air Quality. Inovasi Matematika (Inomatika), 4(1), 1–10. https://doi.org/10.35438/inomatika.v4i1.310

Dewi, N. K. (2021). Deteksi Fake Follower Instagram Menggunakan Catboost Classifer [UIN Syarif Hidayatullah]. https://repository.uinjkt.ac.id/dspace/handle/123456789/56737

Handhayani, T. (2023). An Integrated Analysis of Air Pollution and Meteorological Conditions in Jakarta. Scientific Reports, 13(1), Article ID: 5798. https://doi.org/10.1038/s41598-023-32817-9

Iqbal, M., Susilo, B., & Hizbaron, D. R. (2025). How Local Pollution and Transboundary Air Pollution Impact Air Quality in Jakarta? Papers in Applied Geography, 11(1), 49–62. https://doi.org/10.1080/23754931.2024.2399626

Jufriansah, A., Khusnani, A., Pramudya, Y., Sya’bania, N., Leto, K. T., Hikmatiar, H., & Saputra, S. (2023). AI Big Data System to Predict Air Quality for Environmental Toxicology Monitoring. Journal of Novel Engineering Science and Technology, 2(01), 21–25. https://doi.org/10.56741/jnest.v2i01.314

Kim, D. J., & Kim, J. Y. (2015). Generation Technique of Dynamic Monster’s Behavior Pattern Based on User’s Behavior Pattern Using FuSM. Journal of Next-Generation Convergence Information Services Technology, 1(1), 9–18. https://www.kci.go.kr/kciportal/ci/sereArticleSearch/ciSereArtiView.kci?sereArticleSearchBean.artiId=ART002141142

Lei, T. M. T., Ng, S. C. W., & Siu, S. W. I. (2023). Application of ANN, XGBoost, and Other ML Methods to Forecast Air Quality in Macau. Sustainability, 15(6), Article ID: 5341. https://doi.org/10.3390/su15065341

Lestari, P., Arrohman, M. K., Damayanti, S., & Klimont, Z. (2022). Emissions and Spatial Distribution of Air Pollutants from Anthropogenic Sources in Jakarta. Atmospheric Pollution Research, 13(9), Article ID: 101521. https://doi.org/10.1016/j.apr.2022.101521

Liang, Y. C., Maimury, Y., Chen, A. H. L., & Juarez, J. R. C. (2020). Machine Learning-Based Prediction of Air Quality. Applied Sciences, 10(24), Article ID: 9151. https://doi.org/10.3390/app10249151

Nainggolan, S. P., & Sinaga, A. (2023). Comparative Analysis of Accuracy of Random Forest and Gradient Boosting Classifier Algorithm for Diabetes Classification. Sebatik, 27(1), 97–102. https://doi.org/10.46984/sebatik.v27i1.2157

Okprana, H., & Winanjaya, R. (2022). Analisis Pengaruh Komposisi Data Training dan Testing Terhadap Akurasi Algoritma Resilient Backpropagation (RProp). BRAHMANA: Jurnal Penerapan Kecerdasan Buatan, 4(1), 89–95. https://doi.org/10.30645/brahmana.v4i1.138

Ramadhani, R. F., Prasetiyowati, S. S., & Sibaroni, Y. (2022). Performance Analysis of Air Pollution Classification Prediction Map with Decision Tree and ANN. Journal of Computer System and Informatics (JoSYC), 3(4), 536–543. https://doi.org/10.47065/josyc.v3i4.2117

Ramesh, L. (2023). Prediction of Air Pollution and an Air Quality Index Using Machine Learning Techniques. International Journal of Advanced Research in Computer Science, 14(02), 51–55. https://doi.org/10.26483/ijarcs.v14i2.6972

Ravindiran, G., Karthick, K., Rajamanickam, S., Datta, D., Das, B., Shyamala, G., Hayder, G., & Maria, A. (2025). Ensemble Stacking of Machine Learning Models for Air Quality Prediction for Hyderabad City in India. IScience, 28(2), Article ID: 111894. https://doi.org/10.1016/j.isci.2025.111894

Saputro, I. W., & Sari, B. W. (2020). Uji Performa Algoritma Naïve Bayes untuk Prediksi Masa Studi Mahasiswa. Creative Information Technology Journal, 6(1), 1–11. https://doi.org/10.24076/citec.2019v6i1.178

Syuhada, G., Akbar, A., Hardiawan, D., Pun, V., Darmawan, A., Heryati, S. H. A., Siregar, A. Y. M., Kusuma, R. R., Driejana, R., Ingole, V., Kass, D., & Mehta, S. (2023). Impacts of Air Pollution on Health and Cost of Illness in Jakarta, Indonesia. International Journal of Environmental Research and Public Health, 20(4), Article ID: 2916. https://doi.org/10.3390/ijerph20042916

Downloads

Published

2025-05-31

How to Cite

Syukur, M. A. A. ., Suhartono, S. ., & Chamidy, T. (2025). Prediksi Kualitas Udara Menggunakan Metode CatBoost. JISKA (Jurnal Informatika Sunan Kalijaga), 10(2), 249–258. https://doi.org/10.14421/jiska.2025.10.2.249-258

Issue

Section

Articles

Most read articles by the same author(s)