Patient Segmentation Based on Visit Patterns and Diagnoses Using the K-Means Clustering Algorithm on Medical Records from XYZ Clinic in 2024

Authors

  • Muhammad Solihin UIN Sunan Kalijaga
  • Titi Sari UIN Sunan Kalijaga
  • Sophia Carolina Shani

DOI:

https://doi.org/10.14421/jiehis.5377

Keywords:

Patient Segmentation, K-Means, DBSCAN, Hierarchical Clustering, Data Mining

Abstract

Outpatient clinics in Indonesia routinely generate extensive health data through patient visits; however, such data remain underutilized for strategic and clinical decision-making. This study aims to segment patients based on visit frequency, diagnosis codes, demographic characteristics, and payment types using three clustering techniques: K-Means, Agglomerative Hierarchical Clustering, and DBSCAN. The objective is to determine the most effective method for patient stratification in a primary healthcare setting. Patient visit data from Klinik Pratama UIN Sunan Kalijaga for the year 2024 were analyzed. K-Means produced the most granular structure with nine clusters, DBSCAN identified seven clusters including a noise group, while Hierarchical Clustering yielded three macro-clusters. Internal validation using Silhouette Score, Davies-Bouldin Index, and Calinski-Harabasz Index revealed Hierarchical Clustering as the optimal model, achieving the highest cluster cohesion and separation with a Silhouette Score of 0.502, Calinski-Harabasz Index of 2134.87, and Davies-Bouldin Index of 0.668. The dendrogram and principal component analysis visualization confirmed the natural separation into three clinically meaningful patient segments. Cluster 0 comprised patients with acute respiratory and digestive conditions exhibiting sporadic visits. Cluster 1 consisted predominantly of male BPJS-insured patients with musculoskeletal and dental complaints and moderate visit frequency. Cluster 2 included female BPJS-insured patients with chronic metabolic and vascular diseases requiring consistent and frequent care. These findings demonstrate the efficacy of hierarchical clustering in producing interpretable patient segments and provide a valuable foundation for targeted healthcare management and resource allocation in outpatient clinics.

References

Afkanpour, M., Hosseinzadeh, E., & Tabesh, H. (2024). Identify the most appropriate imputation method for handling missing values in clinical structured datasets: a systematic review. BMC Medical Research Methodology, 24(1). https://doi.org/10.1186/s12874-024-02310-6

Ahmed, I., Khan, M., Ullah, N., Ahmed, N., & Haider, W. (2023). An Exploratory Spatial Data Analysis of Health Indicators in Pakistan ARTICLE INFO. In IJSS (Vol. 2). https://induspublishers.com/IJSS

Dhummad, S. (2025). The Imperative of Exploratory Data Analysis in Machine Learning. Scholars Journal of Engineering and Technology, 13(01), 30–44. https://doi.org/10.36347/sjet.2025.v13i01.005

Ding, C. (2004). K-means Clustering via Principal Component Analysis. https://doi.org/https://doi.org/10.1145/1015330.1015408

Fränti, P., Sieranoja, S., & Laatikainen, T. (2025). Designing a clustering algorithm for optimizing health station locations. International Journal of Health Geographics, 24(1). https://doi.org/10.1186/s12942-025-00390-1

Hidayaturrohman, Q. A., & Hanada, E. (2024). Impact of Data Pre-Processing Techniques on XGBoost Model Performance for Predicting All-Cause Readmission and Mortality Among Patients with Heart Failure. BioMedInformatics, 4(4), 2201–2212. https://doi.org/10.3390/biomedinformatics4040118

Houssein, E. H., Ibrahim, I. E., Neggaz, N., Hassaballah, M., & Wazery, Y. M. (2021). An efficient ECG arrhythmia classification method based on Manta ray foraging optimization. Expert Systems with Applications, 181. https://doi.org/10.1016/j.eswa.2021.115131

Hullman, J., & Gelman, A. (2021). Designing for Interactive Exploratory Data Analysis Requires Theories of Graphical Inference. Harvard Data Science Review. https://doi.org/10.1162/99608f92.3ab8a587

Jee, K., & Kim, G. H. (2013). Potentiality of big data in the medical sector: Focus on how to reshape the healthcare system. In Healthcare Informatics Research (Vol. 19, Issue 2, pp. 79–85). https://doi.org/10.4258/hir.2013.19.2.79

Jia, Q., Zhang, D., Yang, S., Xia, C., Shi, Y., Tao, H., Xu, C., Luo, X., Ma, Y., & Xie, Y. (2021). Traditional Chinese medicine symptom normalization approach leveraging hierarchical semantic information and text matching with attention mechanism. Journal of Biomedical Informatics, 116. https://doi.org/10.1016/j.jbi.2021.103718

Korir, E. K. (2024). Comparative clustering and visualization of socioeconomic and health indicators: A case of Kenya. Socio-Economic Planning Sciences, 95. https://doi.org/10.1016/j.seps.2024.101961

Kumar, K. K., & NVSL Narasimham, D. (2024). Patient Clustering Optimization With K-Means In Healthcare Data Analysis. Cahiers Magellanes-Ns, Volume 06(Issue 2). https://doi.org/10.6084/m9.figshare.26310112

Malli, S., H.R., N., & Rao, B. D. (2020). Approximation to the K-Means Clustering Algorithm using PCA. International Journal of Computer Applications, 175(11), 43–46. https://doi.org/10.5120/ijca2020920605

Mehedi Hassan, M., Mollick, S., & Yasmin, F. (2022). An unsupervised cluster-based feature grouping model for early diabetes detection. Healthcare Analytics, 2. https://doi.org/10.1016/j.health.2022.100112

Nurhaliza, N., & Mustakim. (2021). Clustering of Data Covid-19 Cases in the World Using DBSCAN Algorithms. IJIRSE: Indonesian Journal of Informatic Research and Software Engineering, 1(1), 01–08.

Nurmayanti, W. P., Ratnaningsih, D. J., Nisrina, S., Rahim, A., Malthuf, M., & Kusuma, W. (2022). Clustrering of BPJS National Health Insurance Participant Using DBSCAN Algorithm. Jurnal Varian, 6(1), 25–34. https://doi.org/10.30812/varian.v6i1.1886

Prof. Arati K Kale, & Dr. Dev Ras Pandey. (2024). Data Pre-Processing Technique for Enhancing Healthcare Data Quality Using Artificial Intelligence. International Journal of Scientific Research in Science and Technology, 299–309. https://doi.org/10.32628/ijsrst52411130

Rajabi, A., Eskandari, M., Ghadi, M. J., Li, L., Zhang, J., & Siano, P. (2020). A comparative study of clustering techniques for electrical load pattern segmentation. Renewable and Sustainable Energy Reviews, 120. https://doi.org/10.1016/j.rser.2019.109628

Rebafka, T. (2023). Model-based clustering of multiple networks with a hierarchical algorithm. https://doi.org/10.21203/rs.3.rs-2494480/v1

Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. In Journal of Computational and Applied Mathematics (Vol. 20).

Sinaga, K. P., & Yang, M. S. (2020). Unsupervised K-means clustering algorithm. IEEE Access, 8, 80716–80727. https://doi.org/10.1109/ACCESS.2020.2988796

Srilekha S, & Adhilakshmi. (2021). Comparative Evaluation of K-Means, Hierarchical Clustering, and DBSCAN in Blood Donor Segmentation. In IJFMR240426755 (Vol. 6, Issue 4). www.ijfmr.com

Treitler, L., & Kounadi, O. (2025). Segmentation of Transaction Prices Submarkets in Vienna, Austria Using Multidimensional Spatiotemporal Change–DBSCAN (MDSTC-DBSCAN). ISPRS International Journal of Geo-Information, 14(2). https://doi.org/10.3390/ijgi14020072

Zeinalpour, A., & McElroy, C. P. (2025). Comparative Analysis of Feature Selection Methods in Clustering-Based Detection Methods. Electronics, 14(11), 2119. https://doi.org/10.3390/electronics14112119

Downloads

Published

2025-12-31

How to Cite

Patient Segmentation Based on Visit Patterns and Diagnoses Using the K-Means Clustering Algorithm on Medical Records from XYZ Clinic in 2024. (2025). Journal of Industrial Engineering and Halal Industries, 6(2), 30-40. https://doi.org/10.14421/jiehis.5377

Similar Articles

1-10 of 33

You may also start an advanced similarity search for this article.