Please use this identifier to cite or link to this item:
https://dspace.iiti.ac.in/handle/123456789/18335
| Title: | Approx-Ch: An Approximate Chameleon Clustering for Large-Scale and High-Dimensional Data |
| Authors: | Singh, Priyanshu Ahuja, Kapil |
| Issue Date: | 2025 |
| Publisher: | Institute of Electrical and Electronics Engineers Inc. |
| Citation: | Singh, P., Ahuja, K., & Raha, S. (2025). Approx-Ch: An Approximate Chameleon Clustering for Large-Scale and High-Dimensional Data. Proceedings of 2025 IEEE 22nd India Council International Conference, INDICON 2025. https://doi.org/10.1109/INDICON68490.2025.11392902 |
| Abstract: | Hierarchical clustering remains a fundamental challenge in data mining, particularly when dealing with real-world datasets. Here, traditional approaches fail to scale effectively when the datasets are large-scale and high-dimensional. Recent Chameleon clustering algorithms - Chameleon2, M-Chameleon, and INNGS-Chameleon - have proposed advanced strategies that try to address this challenge. However, they still suffer from O(n2) computational complexity. We address this challenge here by introducing Approximate-Chameleon (Approx-Ch) that has O(n log n) complexity.Our algorithm has three parts. First, Graph Generation - here we use approximate k-NN search instead of an exact one, as used by earlier three algorithms. This results in fast nearest-neighbor computation, significantly reducing the graph generation time. Second, Graph Partitioning - here we use a multi-level partitioning approach as compared to a single-level one, mostly used by the prior three works. This change ensures that graph partitioning is robust to the errors introduced by approximate graph generation. This also facilitates minimal configuration requirements. Third, Merging - here we follow Chameleon2 by retaining its flood-fill heuristic and its merging criteria since it is the cheapest among the earlier three algorithms.On real-world benchmark datasets used in former three works, Approx-Ch delivers an average improvement of 5% in clustering quality and reduces total run-time by 86%. This demonstrates that algorithmic efficiency and clustering quality can co-exist in large-scale hierarchical clustering. © 2025 IEEE. |
| URI: | https://dx.doi.org/10.1109/INDICON68490.2025.11392902 https://dspace.iiti.ac.in:8080/jspui/handle/123456789/18335 |
| ISBN: | 979-833159031-4 |
| Type of Material: | Conference Paper |
| Appears in Collections: | Department of Computer Science and Engineering |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Altmetric Badge: