A Comprehensive Evaluation of Rough Sets Clustering in Uncertainty Driven Contexts

A. Szederjesi-Dragomir

doi:10.24193/subbi.2024.1.03

A. Szederjesi-Dragomir Department of Computer Science, Babes-Bolyai University, 1, M. Kogalniceanu Street, 400084, Cluj-Napoca, Romania

DOI: https://doi.org/10.24193/subbi.2024.1.03

Abstract

This paper presents a comprehensive evaluation of the Agent BAsed Rough sets Clustering (ABARC) algorithm, an approach using rough sets theory for clustering in environments characterized by uncertainty. Several experiments utilizing standard datasets are performed in order to compare ABARC against a range of supervised and unsupervised learning algorithms. This comparison considers various internal and external performance measures to evaluate the quality of clustering. The results highlight the ABARC algorithm’s capability to effectively manage vague data and outliers, showcasing its advantage in handling uncertainty in data. Furthermore, they also emphasize the importance of choosing appropriate performance metrics, especially when evaluating clustering algorithms in scenarios with unclear or inconsistent data.

References

[1] Bachem, O., Lucic, M., Hassani, H., and Krause, A. Fast and provably good seedings for k-means. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 2016, pp. 55–63.
[2] Bera, S., Giri, P. K., Jana, D. K., Basu, K., and Maiti, M. Multi-item 4d-tps under budget constraint using rough interval. Applied Soft Computing 71 (2018), 364–385.
[3] Bezdek, J. C., Ehrlich, R., and Full, W. FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences 10, 2-3 (Jan. 1984), 191–203.
[4] Bharadwaj, A., and Ramanna, S. Categorizing relational facts from the web with fuzzy rough sets. Knowledge and Information Systems 61, 3 (Dec 2019), 1695–1713.
[5] Coy, S., Czumaj, A., and Mishra, G. On parallel k-center clustering. In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (New York, NY, USA, 2023), SPAA ’23, Association for Computing Machinery, p. 65–75.
[6] Deng, Z., Choi, K.-S., Chung, F.-L., and Wang, S. Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recognition 43, 3 (Mar. 2010), 767–781.
[7] Fielding-Singh, P., and Fan, J. X. Dietary patterns among us children: A cluster analysis. Journal of the Academy of Nutrition and Dietetics (2023).
[8] Fisher, R. A. UCI Machine Learning Repository: Iris Data Set. http://archive.ics.uci.edu/ml/datasets/Iris, 1936.
[9] Forina, M. UCI Machine Learning Repository: Wine Data Set. https://archive.ics.uci.edu/ml/datasets/wine, 1991.
[10] Frigui, H., and Nasraoui, O. Unsupervised learning of prototypes and attribute weights. Pattern Recognition 37, 3 (Mar. 2004), 567–581.
[11] Găceanu, R. D., Szederjesi-Dragomir, A., Pop, H. F., and Sârbu, C. Abarc: An agent-based rough sets clustering algorithm. Intelligent Systems with Applications 16 (2022), 200117.
[12] Hong, J., and Kim, S.-W. C-affinity: A novel similarity measure for effective data clustering. In Companion Proceedings of the ACM Web Conference 2023 (New York, NY, USA, 2023), WWW ’23 Companion, Association for Computing Machinery, p. 41–44.
[13] Huang, J., Ng, M., Rong, H., and Li, Z. Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 5 (May 2005), 657–668.
[14] Janowski, A. M., Ravellette, K. S., Insel, M., Garcia, J. G., Rischard, F. P., and Vanderpool, R. R. Advanced hemodynamic and cluster analysis for identifying novel rv function subphenotypes in patients with pulmonary hypertension. The Journal of Heart and Lung Transplantation (2023).
[15] Jing, L., Ng, M. K., and Huang, J. Z. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Transactions on Knowledge and Data Engineering 19, 8 (Aug. 2007), 1026–1041.
[16] Karim, S. M., Habbal, A., Hamouda, H., and Alaidaros, H. A secure multifactor-based clustering scheme for internet of vehicles. Journal of King Saud University - Computer and Information Sciences 35, 10 (2023), 101867.
[17] Kato, Y., Saeki, T., and Mizuno, S. Considerations on the principle of rule induction by strim and its relationship to the conventional rough sets methods. Applied Soft Computing 73 (2018), 933 – 942.
[18] Kulczycki, P. UCI Machine Learning Repository: Seeds Data Set. https://archive.ics.uci.edu/ml/datasets/seeds, 2012.
[19] Lei, L. Wavelet neural network prediction method of stock price trend based on rough set attribute reduction. Applied Soft Computing 62 (2018), 923 – 932.
[20] Li, Y., Fan, J.-c., Pan, J.-S., Mao, G.-h., and Wu, G.-k. A novel rough fuzzy clustering algorithm with a new similarity measurement. Journal of Internet Technology 20, 4 (2019), 1145–1156.
[21] Lingras, P., and West, C. Interval set clustering of web users with rough k-means. J. Intell. Inf. Syst. 23, 1 (2004), 5–16.
[22] Liu, Y., Qin, K., and Martinez, L. Improving decision making approaches based on fuzzy soft sets and rough soft sets. Applied Soft Computing 65 (2018), 320 – 332.
[23] MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics (Berkeley, Calif., 1967), University of California Press, pp. 281–297.
[24] Maji, P., and Pal, S. Rough-fuzzy pattern recognition: applications in bioinformatics and medical imaging, vol. 3. John Wiley & Sons, 2012.
[25] Maji, P., and Pal, S. K. Rough set based generalized fuzzy -means algorithm and quantitative indices. Trans. Sys. Man Cyber. Part B 37, 6 (2007), 1529–1540.
[26] MAJI, P., and PAL, S. K. ROUGH-FUZZY PATTERN RECOGNITION. Wiley, 2012.
[27] Pamucar, D., Stevic, Z., and Zavadskas, E. K. Integration of interval rough AHP and interval rough MABAC methods for evaluating university web pages. Applied Soft Computing 67 (2018), 141 – 163.
[28] Pawlak, Z. Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Norwell, MA, USA, 1992.
[29] Rajhasthan, Sharma, K., and College, S. Classification of iris dataset using wekas, Dec 2019.
[30] Singh, V., and Verma, N. K. An entropy-based variable feature weighted fuzzy k-means algorithm for high dimensional data. arXiv preprint arXiv:1912.11209 (2019).
[31] Tarn, C., Zhang, Y., and Feng, Y. Sampling clustering. CoRR abs/1806.08245 (2018).
[32] Tschannen, M., and Bolcskei, H. Noisy subspace clustering via matching pursuits. IEEE Transactions on Information Theory 64, 6 (June 2018), 4081–4104.
[33] Wang, P.-C., Su, C.-T., Chen, K.-H., and Chen, N.-H. The application of rough set and Mahalanobis distance to enhance the quality of osa diagnosis. Expert Systems with Applications 38, 6 (2011), 7828 – 7836.
[34] Wei, M., Chow, T. W., and Chan, R. H. Clustering heterogeneous data with k-means by mutual information-based unsupervised feature transformation. entropy 17, 3 (2015), 1535–1548.
[35] Xie, X., Qin, X., Yu, C., and Xu, X. Test-cost-sensitive rough set based approach for minimum weight vertex cover problem. Applied Soft Computing 64 (2018), 423–435.
[36] Xiong, L., Wang, C., Huang, X., and Zeng, H. An entropy regularization k-means algorithm with a new measure of between-cluster distance in subspace clustering. Entropy 21, 7 (July 2019), 683.
[37] Yang, H.-H., and Wu, C.-L. Rough sets to help medical diagnosis - evidence from a taiwan’s clinic. Expert Systems with Applications 36, 5 (2009), 9293 – 9298.
[38] https://scikit-learn.org.