Optimizing K-Means Algorithm Using the Purity Method for Clustering Oil Palm Producing Regions
DOI:
https://doi.org/10.14421/jiska.2025.10.1.1-15Keywords:
K-Means Algorithm, Purity Method, Data Clustering, Oil Palm Production, Davies-Bouldin Index (DBI)Abstract
The K-Means algorithm is a fundamental tool in machine learning, widely utilized for data clustering tasks. This research aims to enhance the performance of the K-Means algorithm by integrating the Purity method, with a specific focus on clustering regions renowned for oil palm production in North Aceh. Oil palm cultivation is a vital agricultural sector in North Aceh, contributing significantly to the local economy and employment. This study examines two clustering techniques: the conventional K-Means algorithm and an optimized version, Purity K-Means. Integrating the Purity method enhances the efficiency of K-Means by reducing the number of required convergence iterations. The data used for clustering analysis is sourced from the Department of Agriculture and Food in North Aceh Regency and pertains to oil palm production in 2023. The findings indicate that the Purity K-Means approach notably reduces the iteration count and improves cluster quality. The average Davies-Bouldin Index (DBI) for standard K-Means is 0.45, whereas the Purity K-Means method lowers it to 0.30. Furthermore, applying the Purity method reduced the number of K-Means iterations from 15 to just 3. These results highlight an enhancement in clustering performance and overall efficiency.
References
Ariyanto, Y., Sabilla, W. I., & As Sidiq, Z. S. (2024). Recommendation System for Clustering to Allocate Classes for New Students Using the K-Means Method. Compiler, 13(1), 27. https://doi.org/10.28989/compiler.v13i1.1962
Bhatti, M. A., Zeeshan, Z., M.S., S., Bhatti, U. A., Khan, A., Ghadi, Y. Y., Alsenan, S., Li, Y., Asif, M., & Afzal, T. (2024). Advanced Plant Disease Segmentation in Precision Agriculture Using Optimal Dimensionality Reduction with Fuzzy C-Means Clustering and Deep Learning. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 17, 18264–18277. https://doi.org/10.1109/JSTARS.2024.3437469
Cebolla-Alemany, J., Martí, M. M., Viana, M., Moreno-Martín, V., San Félix, V., & Bou, D. (2024). Optimizing Indoor Air Models Through K-Means Clustering of Nanoparticle Size Distribution Data. Building and Environment, 266, 112091. https://doi.org/10.1016/j.buildenv.2024.112091
Dinata, R. K., Adek, R. T., Hasdyna, N., & Retno, S. (2023). K-Nearest Neighbor Classifier Optimization Using Purity. AIP Conference Proceedings, 2431(1). https://doi.org/10.1063/5.0117058/2906121
Ezugwu, A. E., Ikotun, A. M., Oyelade, O. O., Abualigah, L., Agushaka, J. O., Eke, C. I., & Akinyelu, A. A. (2022). A Comprehensive Survey of Clustering Algorithms: State-of-The-Art Machine Learning Applications, Taxonomy, Challenges, and Future Research Prospects. Engineering Applications of Artificial Intelligence, 110, 104743. https://doi.org/10.1016/j.engappai.2022.104743
Ezugwu, A. E., Shukla, A. K., Agbaje, M. B., Oyelade, O. N., José-García, A., & Agushaka, J. O. (2021). Automatic Clustering Algorithms: A Systematic Review and Bibliometric Analysis of Relevant Literature. Neural Computing and Applications, 33(11), 6247–6306. https://doi.org/10.1007/s00521-020-05395-4
Gul, M., & Rehman, M. A. (2023). Big Data: An Optimized Approach for Cluster Initialization. Journal of Big Data, 10(1), 120. https://doi.org/10.1186/s40537-023-00798-1
Gupta, S. B., Yadav, R., & Gupta, S. (2020). A Systematic Comparative Analysis of Clustering Techniques. Applied Computer Systems, 25(2), 87–104. https://doi.org/10.2478/acss-2020-0011
Hasdyna, N., & Dinata, R. K. (2024). Comparative Analysis of K-Medoids and Purity K-Medoids Methods for Identifying Accident-Prone Areas in North Aceh Regency. Scientific Journal of Informatics, 11(2), 263–272. https://doi.org/10.15294/SJI.V11I2.3433
Henderi, H., Fitriana, L., Iskandar, I., Astuti, R., Arifandy, M. I., Hayadi, B. H., Mesran, M., Chin, J., & Kurniawan, A. (2024). Optimization of Davies-Bouldin Index with K-Medoids Algorithm. Science and Technology Research Symposium 2022, 3065(1), 030002. https://doi.org/10.1063/5.0225220/3311944
Kouadio, K. L., Liu, J., Liu, R., Wang, Y., & Liu, W. (2024). K-Means Featurizer: A Booster for Intricate Datasets. Earth Science Informatics, 17(2), 1203–1228. https://doi.org/10.1007/S12145-024-01236-3/METRICS
Li, M., Frank, E., & Pfahringer, B. (2023). Large Scale K-Means Clustering Using GPUs. Data Mining and Knowledge Discovery, 37(1), 67–109. https://doi.org/10.1007/S10618-022-00869-6/TABLES/22
Majumdar, P., Bhattacharya, D., Mitra, S., Solgi, R., Oliva, D., & Bhusan, B. (2023). Demand Prediction of Rice Growth Stage-Wise Irrigation Water Requirement and Fertilizer Using Bayesian Genetic Algorithm and Random Forest for Yield Enhancement. Paddy and Water Environment, 21(2), 275–293. https://doi.org/10.1007/S10333-023-00930-0/METRICS
Moodi, F., & Saadatfar, H. (2022). An Improved K‐Means Algorithm for Big Data. IET Software, 16(1), 48–59. https://doi.org/10.1049/sfw2.12032
Mussabayev, R., Mladenovic, N., Jarboui, B., & Mussabayev, R. (2023). How to Use K-means for Big Data Clustering? Pattern Recognition, 137, 109269. https://doi.org/10.1016/j.patcog.2022.109269
Naz, H., Saba, T., Alamri, F. S., Almasoud, A. S., & Rehman, A. (2024). An Improved Robust Fuzzy Local Information K-Means Clustering Algorithm for Diabetic Retinopathy Detection. IEEE Access, 12, 78611–78623. https://doi.org/10.1109/ACCESS.2024.3392032
Retno, S., Hasdyna, N., & Yafis, B. (2024). K-NN with Purity Algorithm to Enhance the Classification of the Air Quality Dataset. Journal of Advanced Computer Knowledge and Algorithms, 1(2), 42–46. https://doi.org/10.29103/jacka.v1i2.15890
Rezaee, L., Davatgar, N., Moosavi, A. A., & Sepaskhah, A. R. (2023). Implications of Spatial Variability of Soil Physical Attributes in Delineating Site-Specific Irrigation Management Zones for Rice Crop. Journal of Soil Science and Plant Nutrition, 23(4), 6596–6611. https://doi.org/10.1007/S42729-023-01513-Y/METRICS
Ros, F., Riad, R., & Guillaume, S. (2023). PDBI: A Partitioning Davies-Bouldin Index for Clustering Evaluation. Neurocomputing, 528, 178–199. https://doi.org/10.1016/j.neucom.2023.01.043
Thakur, B., & Kaur, S. (2024). The Role of Artificial Intelligence in Biofertilizer Development. In Metabolomics, Proteomics and Gene Editing Approaches in Biofertilizer Industry (pp. 157–176). Springer Nature Singapore. https://doi.org/10.1007/978-981-97-2910-4_9
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Novia Hasdyna, Rozzi Kesuma Dinata, Balqis Yafis

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors who publish with this journal agree to the following terms as stated in http://creativecommons.org/licenses/by-nc/4.0
a. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
b. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
c. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.




