Please use this identifier to cite or link to this item:
https://dspace.iiti.ac.in/handle/123456789/4817
Title: | Scalable incremental fuzzy consensus clustering algorithm for handling big data |
Authors: | Jha, Preeti Tiwari, Aruna Bharill, Neha Mounika, Mukkamalla |
Keywords: | Cluster analysis;Data streams;Digital storage;Iterative methods;Large dataset;Cluster framework;Consensus clustering;Distributed data streams;Heterogeneous data clustering;Iterative algorithm;Quality segments;Real-world datasets;Robust clustering;Clustering algorithms |
Issue Date: | 2021 |
Publisher: | Springer Science and Business Media Deutschland GmbH |
Citation: | Jha, P., Tiwari, A., Bharill, N., Ratnaparkhe, M., Nagendra, N., & Mounika, M. (2021). Scalable incremental fuzzy consensus clustering algorithm for handling big data. Soft Computing, 25(13), 8703-8719. doi:10.1007/s00500-021-05733-1 |
Abstract: | Consensus clustering can produce novel, stable, and robust clustering results. Consensus clustering intends to merge a few existing basic segments into a coordinated one, and this has been broadly perceived as a promising solution for heterogeneous data clustering for big data. Even though many clustering algorithms have been proposed, getting a decent quality segment with high effectiveness is still not yet decided. In this paper, we propose a scalable incremental fuzzy consensus clustering (SIFCC) algorithm for a big data framework. It has been implemented on Apache Spark cluster framework, a distributed data stream environment for handling big data by considering the data as a set of data subsets that are processed incrementally. Sparks work great for iterative algorithms by supporting in-memory calculations, scalability, etc. SIFCC not only facilitates efficient big data clustering, but also improves the quality of clusters, performs storage space optimization, and time complexity during clustering. To establish the comparison, we designed and implemented the scalable model of existing fuzzy consensus clustering (FCC) on Apache Spark cluster, named as a scalable fuzzy consensus clustering (SFCC). Extensive experiments on real-world datasets show that the SIFCC algorithm achieves the better potential for clustering of Big Data in comparison with SFCC. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature. |
URI: | https://doi.org/10.1007/s00500-021-05733-1 https://dspace.iiti.ac.in/handle/123456789/4817 |
ISSN: | 1432-7643 |
Type of Material: | Journal Article |
Appears in Collections: | Department of Computer Science and Engineering |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Altmetric Badge: