Approx-Ch: An Approximate Chameleon Clustering for Large-Scale and High-Dimensional Data

Singh, Priyanshu; Ahuja, Kapil

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/18335

Title:	Approx-Ch: An Approximate Chameleon Clustering for Large-Scale and High-Dimensional Data
Authors:	Singh, Priyanshu Ahuja, Kapil
Issue Date:	2025
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Citation:	Singh, P., Ahuja, K., & Raha, S. (2025). Approx-Ch: An Approximate Chameleon Clustering for Large-Scale and High-Dimensional Data. Proceedings of 2025 IEEE 22nd India Council International Conference, INDICON 2025. https://doi.org/10.1109/INDICON68490.2025.11392902
Abstract:	Hierarchical clustering remains a fundamental challenge in data mining, particularly when dealing with real-world datasets. Here, traditional approaches fail to scale effectively when the datasets are large-scale and high-dimensional. Recent Chameleon clustering algorithms - Chameleon2, M-Chameleon, and INNGS-Chameleon - have proposed advanced strategies that try to address this challenge. However, they still suffer from O(n2) computational complexity. We address this challenge here by introducing Approximate-Chameleon (Approx-Ch) that has O(n log n) complexity.Our algorithm has three parts. First, Graph Generation - here we use approximate k-NN search instead of an exact one, as used by earlier three algorithms. This results in fast nearest-neighbor computation, significantly reducing the graph generation time. Second, Graph Partitioning - here we use a multi-level partitioning approach as compared to a single-level one, mostly used by the prior three works. This change ensures that graph partitioning is robust to the errors introduced by approximate graph generation. This also facilitates minimal configuration requirements. Third, Merging - here we follow Chameleon2 by retaining its flood-fill heuristic and its merging criteria since it is the cheapest among the earlier three algorithms.On real-world benchmark datasets used in former three works, Approx-Ch delivers an average improvement of 5% in clustering quality and reduces total run-time by 86%. This demonstrates that algorithmic efficiency and clustering quality can co-exist in large-scale hierarchical clustering. © 2025 IEEE.
URI:	https://dx.doi.org/10.1109/INDICON68490.2025.11392902 https://dspace.iiti.ac.in:8080/jspui/handle/123456789/18335
ISBN:	979-833159031-4
Type of Material:	Conference Paper
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show full item record

Altmetric Badge: