Fuzzy based clustering algorithms to handle big data with implementation on apache spark

Bharill, Neha; Tiwari, Aruna

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/368

Title:	Fuzzy based clustering algorithms to handle big data with implementation on apache spark
Authors:	Bharill, Neha Tiwari, Aruna
Keywords:	Algorithms;Artificial intelligence;Big data;Cluster analysis;Copying;Fuzzy clustering;Fuzzy systems;Iterative methods;Learning systems;Optimization;Computational requirements;Fuzzy C-means algorithms;Iterative algorithm;Iterative Optimization;Objective functions;Partitional clustering;Partitional clustering algorithm;Scalable machine learning;Clustering algorithms
Issue Date:	2016
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Citation:	Bharill, N., Tiwari, A., & Malviya, A. (2016). Fuzzy based clustering algorithms to handle big data with implementation on apache spark. Paper presented at the Proceedings - 2016 IEEE 2nd International Conference on Big Data Computing Service and Applications, BigDataService 2016, 95-104. doi:10.1109/BigDataService.2016.34
Series/Report no.:	CP1;
Abstract:	With the advancement in technology, a huge amount of data containing useful information, called Big Data, is generated on a daily basis. For processing such tremendous volume of data, there is a need of Big Data frameworks such as Hadoop MapReduce, Apache Spark etc. Among these, Apache Spark performs up to 100 times faster than conventional frameworks like Hadoop Mapreduce. For the effective analysis and interpretation of this data, scalable Machine Learning methods are required to overcome the space and time bottlenecks. Partitional clustering algorithms are widely adopted by researchers for clustering large datasets due to their low computational requirements. Thus, we focus on the design of partitional clustering algorithm and its implementation on Apache Spark. In this paper, we propose a partitional based clustering algorithm called Scalable Random Sampling with Iterative Optimization Fuzzy c-Means algorithm (SRSIO-FCM) which is implemented on Apache Spark to handle the challenges associated with Big Data Clustering. Experimentation is performed on several big datasets to show the effectiveness of SRSIO-FCM in comparison with a proposed scalable version of the Literal Fuzzy c-Means (LFCM) called SLFCM implemented on Apache Spark. The comparative results are reported in terms of value of F-measure, ARI, Objective function, Run-time and Scalability. The reported results show the great potential of SRSIO-FCM for Big Data clustering. © 2016 IEEE.
URI:	https://doi.org/10.1109/BigDataService.2016.34 https://dspace.iiti.ac.in/handle/123456789/368
Type of Material:	Conference Paper
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

File	Description	Size	Format
CP1.pdf Restricted Access		588.61 kB	Adobe PDF	View/Open Request a copy

Show full item record

Altmetric Badge: