Optimizing K-Means Algorithm Using the Purity Method for Clustering Oil Palm Producing Regions

Novia Hasdyna; Rozzi Kesuma Dinata; Balqis Yafis

doi:10.14421/jiska.2025.10.1.1-15

Authors

Novia Hasdyna Universitas Islam Kebangsaan Indonesia
Rozzi Kesuma Dinata Universitas Malikussaleh
Balqis Yafis National Yang Ming Chiao Tung University

DOI:

https://doi.org/10.14421/jiska.2025.10.1.1-15

Keywords:

K-Means Algorithm, Purity Method, Data Clustering, Oil Palm Production, Davies-Bouldin Index (DBI)

Abstract

The K-Means algorithm is a fundamental tool in machine learning, widely utilized for data clustering tasks. This research aims to enhance the performance of the K-Means algorithm by integrating the Purity method, with a specific focus on clustering regions renowned for oil palm production in North Aceh. Oil palm cultivation is a vital agricultural sector in North Aceh, contributing significantly to the local economy and employment. This study examines two clustering techniques: the conventional K-Means algorithm and an optimized version, Purity K-Means. Integrating the Purity method enhances the efficiency of K-Means by reducing the number of required convergence iterations. The data used for clustering analysis is sourced from the Department of Agriculture and Food in North Aceh Regency and pertains to oil palm production in 2023. The findings indicate that the Purity K-Means approach notably reduces the iteration count and improves cluster quality. The average Davies-Bouldin Index (DBI) for standard K-Means is 0.45, whereas the Purity K-Means method lowers it to 0.30. Furthermore, applying the Purity method reduced the number of K-Means iterations from 15 to just 3. These results highlight an enhancement in clustering performance and overall efficiency.

References

Ariyanto, Y., Sabilla, W. I., & As Sidiq, Z. S. (2024). Recommendation System for Clustering to Allocate Classes for New Students Using the K-Means Method. Compiler, 13(1), 27. https://doi.org/10.28989/compiler.v13i1.1962

Bhatti, M. A., Zeeshan, Z., M.S., S., Bhatti, U. A., Khan, A., Ghadi, Y. Y., Alsenan, S., Li, Y., Asif, M., & Afzal, T. (2024). Advanced Plant Disease Segmentation in Precision Agriculture Using Optimal Dimensionality Reduction with Fuzzy C-Means Clustering and Deep Learning. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17, 18264–18277. https://doi.org/10.1109/JSTARS.2024.3437469

Cebolla-Alemany, J., Martí, M. M., Viana, M., Moreno-Martín, V., San Félix, V., & Bou, D. (2024). Optimizing Indoor Air Models Through K-Means Clustering of Nanoparticle Size Distribution Data. Building and Environment, 266, 112091. https://doi.org/10.1016/j.buildenv.2024.112091

Dinata, R. K., Adek, R. T., Hasdyna, N., & Retno, S. (2023). K-Nearest Neighbor Classifier Optimization Using Purity. AIP Conference Proceedings, 2431(1). https://doi.org/10.1063/5.0117058/2906121

Ezugwu, A. E., Ikotun, A. M., Oyelade, O. O., Abualigah, L., Agushaka, J. O., Eke, C. I., & Akinyelu, A. A. (2022). A Comprehensive Survey of Clustering Algorithms: State-of-The-Art Machine Learning Applications, Taxonomy, Challenges, and Future Research Prospects. Engineering Applications of Artificial Intelligence, 110, 104743. https://doi.org/10.1016/j.engappai.2022.104743

Ezugwu, A. E., Shukla, A. K., Agbaje, M. B., Oyelade, O. N., José-García, A., & Agushaka, J. O. (2021). Automatic Clustering Algorithms: A Systematic Review and Bibliometric Analysis of Relevant Literature. Neural Computing and Applications, 33(11), 6247–6306. https://doi.org/10.1007/s00521-020-05395-4

Gul, M., & Rehman, M. A. (2023). Big Data: An Optimized Approach for Cluster Initialization. Journal of Big Data, 10(1), 120. https://doi.org/10.1186/s40537-023-00798-1

Gupta, S. B., Yadav, R., & Gupta, S. (2020). A Systematic Comparative Analysis of Clustering Techniques. Applied Computer Systems, 25(2), 87–104. https://doi.org/10.2478/acss-2020-0011

Hasdyna, N., & Dinata, R. K. (2024). Comparative Analysis of K-Medoids and Purity K-Medoids Methods for Identifying Accident-Prone Areas in North Aceh Regency. Scientific Journal of Informatics, 11(2), 263–272. https://doi.org/10.15294/SJI.V11I2.3433

Henderi, H., Fitriana, L., Iskandar, I., Astuti, R., Arifandy, M. I., Hayadi, B. H., Mesran, M., Chin, J., & Kurniawan, A. (2024). Optimization of Davies-Bouldin Index with K-Medoids Algorithm. Science and Technology Research Symposium 2022, 3065(1), 030002. https://doi.org/10.1063/5.0225220/3311944

Kouadio, K. L., Liu, J., Liu, R., Wang, Y., & Liu, W. (2024). K-Means Featurizer: A Booster for Intricate Datasets. Earth Science Informatics, 17(2), 1203–1228. https://doi.org/10.1007/S12145-024-01236-3/METRICS

Li, M., Frank, E., & Pfahringer, B. (2023). Large Scale K-Means Clustering Using GPUs. Data Mining and Knowledge Discovery, 37(1), 67–109. https://doi.org/10.1007/S10618-022-00869-6/TABLES/22

Majumdar, P., Bhattacharya, D., Mitra, S., Solgi, R., Oliva, D., & Bhusan, B. (2023). Demand Prediction of Rice Growth Stage-Wise Irrigation Water Requirement and Fertilizer Using Bayesian Genetic Algorithm and Random Forest for Yield Enhancement. Paddy and Water Environment, 21(2), 275–293. https://doi.org/10.1007/S10333-023-00930-0/METRICS

Moodi, F., & Saadatfar, H. (2022). An Improved K‐Means Algorithm for Big Data. IET Software, 16(1), 48–59. https://doi.org/10.1049/sfw2.12032

Mussabayev, R., Mladenovic, N., Jarboui, B., & Mussabayev, R. (2023). How to Use K-means for Big Data Clustering? Pattern Recognition, 137, 109269. https://doi.org/10.1016/j.patcog.2022.109269

Naz, H., Saba, T., Alamri, F. S., Almasoud, A. S., & Rehman, A. (2024). An Improved Robust Fuzzy Local Information K-Means Clustering Algorithm for Diabetic Retinopathy Detection. IEEE Access, 12, 78611–78623. https://doi.org/10.1109/ACCESS.2024.3392032

Retno, S., Hasdyna, N., & Yafis, B. (2024). K-NN with Purity Algorithm to Enhance the Classification of the Air Quality Dataset. Journal of Advanced Computer Knowledge and Algorithms, 1(2), 42–46. https://doi.org/10.29103/jacka.v1i2.15890

Rezaee, L., Davatgar, N., Moosavi, A. A., & Sepaskhah, A. R. (2023). Implications of Spatial Variability of Soil Physical Attributes in Delineating Site-Specific Irrigation Management Zones for Rice Crop. Journal of Soil Science and Plant Nutrition, 23(4), 6596–6611. https://doi.org/10.1007/S42729-023-01513-Y/METRICS

Ros, F., Riad, R., & Guillaume, S. (2023). PDBI: A Partitioning Davies-Bouldin Index for Clustering Evaluation. Neurocomputing, 528, 178–199. https://doi.org/10.1016/j.neucom.2023.01.043

Thakur, B., & Kaur, S. (2024). The Role of Artificial Intelligence in Biofertilizer Development. In Metabolomics, Proteomics and Gene Editing Approaches in Biofertilizer Industry (pp. 157–176). Springer Nature Singapore. https://doi.org/10.1007/978-981-97-2910-4_9

Optimizing K-Means Algorithm Using the Purity Method for Clustering Oil Palm Producing Regions

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

AUTHOR INFORMATION

Indexed by

Statistic

Current Issue