Fuzzy based clustering algorithms to handle big data with implementation on apache spark

Bharill, Neha; Tiwari, Aruna

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/368

Full metadata record

DC Field	Value	Language
dc.contributor.author	Bharill, Neha	en_US
dc.contributor.author	Tiwari, Aruna	en_US
dc.date.accessioned	2016-10-25T05:38:05Z	-
dc.date.available	2016-10-25T05:38:05Z	-
dc.date.issued	2016	-
dc.identifier.citation	Bharill, N., Tiwari, A., & Malviya, A. (2016). Fuzzy based clustering algorithms to handle big data with implementation on apache spark. Paper presented at the Proceedings - 2016 IEEE 2nd International Conference on Big Data Computing Service and Applications, BigDataService 2016, 95-104. doi:10.1109/BigDataService.2016.34	en_US
dc.identifier.other	EID(2-s2.0-84973650084)	-
dc.identifier.uri	https://doi.org/10.1109/BigDataService.2016.34	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/368	-
dc.description.abstract	With the advancement in technology, a huge amount of data containing useful information, called Big Data, is generated on a daily basis. For processing such tremendous volume of data, there is a need of Big Data frameworks such as Hadoop MapReduce, Apache Spark etc. Among these, Apache Spark performs up to 100 times faster than conventional frameworks like Hadoop Mapreduce. For the effective analysis and interpretation of this data, scalable Machine Learning methods are required to overcome the space and time bottlenecks. Partitional clustering algorithms are widely adopted by researchers for clustering large datasets due to their low computational requirements. Thus, we focus on the design of partitional clustering algorithm and its implementation on Apache Spark. In this paper, we propose a partitional based clustering algorithm called Scalable Random Sampling with Iterative Optimization Fuzzy c-Means algorithm (SRSIO-FCM) which is implemented on Apache Spark to handle the challenges associated with Big Data Clustering. Experimentation is performed on several big datasets to show the effectiveness of SRSIO-FCM in comparison with a proposed scalable version of the Literal Fuzzy c-Means (LFCM) called SLFCM implemented on Apache Spark. The comparative results are reported in terms of value of F-measure, ARI, Objective function, Run-time and Scalability. The reported results show the great potential of SRSIO-FCM for Big Data clustering. © 2016 IEEE.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.relation.ispartofseries	CP1;	en_US
dc.source	Proceedings - 2016 IEEE 2nd International Conference on Big Data Computing Service and Applications, BigDataService 2016	en_US
dc.subject	Algorithms	en_US
dc.subject	Artificial intelligence	en_US
dc.subject	Big data	en_US
dc.subject	Cluster analysis	en_US
dc.subject	Copying	en_US
dc.subject	Fuzzy clustering	en_US
dc.subject	Fuzzy systems	en_US
dc.subject	Iterative methods	en_US
dc.subject	Learning systems	en_US
dc.subject	Optimization	en_US
dc.subject	Computational requirements	en_US
dc.subject	Fuzzy C-means algorithms	en_US
dc.subject	Iterative algorithm	en_US
dc.subject	Iterative Optimization	en_US
dc.subject	Objective functions	en_US
dc.subject	Partitional clustering	en_US
dc.subject	Partitional clustering algorithm	en_US
dc.subject	Scalable machine learning	en_US
dc.subject	Clustering algorithms	en_US
dc.title	Fuzzy based clustering algorithms to handle big data with implementation on apache spark	en_US
dc.type	Conference Paper	en_US
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

File	Description	Size	Format
CP1.pdf Restricted Access		588.61 kB	Adobe PDF	View/Open Request a copy

Show simple item record

Altmetric Badge: