Scalable incremental fuzzy consensus clustering algorithm for handling big data

Jha, Preeti; Tiwari, Aruna; Bharill, Neha; Mounika, Mukkamalla

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/4817

Title:	Scalable incremental fuzzy consensus clustering algorithm for handling big data
Authors:	Jha, Preeti Tiwari, Aruna Bharill, Neha Mounika, Mukkamalla
Keywords:	Cluster analysis;Data streams;Digital storage;Iterative methods;Large dataset;Cluster framework;Consensus clustering;Distributed data streams;Heterogeneous data clustering;Iterative algorithm;Quality segments;Real-world datasets;Robust clustering;Clustering algorithms
Issue Date:	2021
Publisher:	Springer Science and Business Media Deutschland GmbH
Citation:	Jha, P., Tiwari, A., Bharill, N., Ratnaparkhe, M., Nagendra, N., & Mounika, M. (2021). Scalable incremental fuzzy consensus clustering algorithm for handling big data. Soft Computing, 25(13), 8703-8719. doi:10.1007/s00500-021-05733-1
Abstract:	Consensus clustering can produce novel, stable, and robust clustering results. Consensus clustering intends to merge a few existing basic segments into a coordinated one, and this has been broadly perceived as a promising solution for heterogeneous data clustering for big data. Even though many clustering algorithms have been proposed, getting a decent quality segment with high effectiveness is still not yet decided. In this paper, we propose a scalable incremental fuzzy consensus clustering (SIFCC) algorithm for a big data framework. It has been implemented on Apache Spark cluster framework, a distributed data stream environment for handling big data by considering the data as a set of data subsets that are processed incrementally. Sparks work great for iterative algorithms by supporting in-memory calculations, scalability, etc. SIFCC not only facilitates efficient big data clustering, but also improves the quality of clusters, performs storage space optimization, and time complexity during clustering. To establish the comparison, we designed and implemented the scalable model of existing fuzzy consensus clustering (FCC) on Apache Spark cluster, named as a scalable fuzzy consensus clustering (SFCC). Extensive experiments on real-world datasets show that the SIFCC algorithm achieves the better potential for clustering of Big Data in comparison with SFCC. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
URI:	https://doi.org/10.1007/s00500-021-05733-1 https://dspace.iiti.ac.in/handle/123456789/4817
ISSN:	1432-7643
Type of Material:	Journal Article
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show full item record

Altmetric Badge: