Scaled and Projected Spectral Clustering with Vector Quantization for Handling Big Data

Nemade, Vishal; Shastri, Aditya A.; Ahuja, Kapil; Tiwari, Aruna

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/4603

Full metadata record

DC Field	Value	Language
dc.contributor.author	Nemade, Vishal	en_US
dc.contributor.author	Shastri, Aditya A.	en_US
dc.contributor.author	Ahuja, Kapil	en_US
dc.contributor.author	Tiwari, Aruna	en_US
dc.date.accessioned	2022-03-17T01:00:00Z	-
dc.date.accessioned	2022-03-17T15:34:56Z	-
dc.date.available	2022-03-17T01:00:00Z	-
dc.date.available	2022-03-17T15:34:56Z	-
dc.date.issued	2019	-
dc.identifier.citation	Nemade, V., Shastri, A., Ahuja, K., & Tiwari, A. (2019). Scaled and projected spectral clustering with vector quantization for handling big data. Paper presented at the Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence, SSCI 2018, 2174-2179. doi:10.1109/SSCI.2018.8628915	en_US
dc.identifier.isbn	9781538692769	-
dc.identifier.other	EID(2-s2.0-85062766825)	-
dc.identifier.uri	https://doi.org/10.1109/SSCI.2018.8628915	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/4603	-
dc.description.abstract	In this modern era, the advent of web technologies and social networking websites is generating a significant amount of data every day. In this scenario, where the data size is now reaching zetta bytes (i.e., 1021), its analysis is very important.Since spectral-based clustering algorithms provide more accurate results than traditional clustering algorithms, we focus on these algorithms. In our work, we propose a modified version of spectral clustering, which we call Projected Spectral Clustering (PSC). As the complexity of the PSC algorithm is Opn3q, where n is the size of the data, we use two variants of vector quantization sampling namely k-Means (KM) and Bisecting k-Means (BKM). To make our algorithm scalable for handling Big Data, we implement it on Apache Spark using two approaches for computing the Gaussian Kernel matrix, which is the most important step here (i.e. Map Reduce and Map Only). We call this algorithm Scalable PSC (SPSC).We measure the accuracy of SPSC using three evaluation criteria tested on a variety of different datasets. Our new algorithm gives good clustering accuracies. Further, we perform another set of experiments on a different number of cores to demonstrate runtime/ scalability efficiency of our algorithm. Finally, we prove this scalability by doing a complexity analysis. © 2018 IEEE.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.source	Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence, SSCI 2018	en_US
dc.subject	Artificial intelligence	en_US
dc.subject	Big data	en_US
dc.subject	Cluster analysis	en_US
dc.subject	Matrix algebra	en_US
dc.subject	Sampling	en_US
dc.subject	Scalability	en_US
dc.subject	Social sciences computing	en_US
dc.subject	Vector quantization	en_US
dc.subject	Clustering accuracy	en_US
dc.subject	Complexity analysis	en_US
dc.subject	Evaluation criteria	en_US
dc.subject	Gaussian kernels	en_US
dc.subject	Map-reduce	en_US
dc.subject	Spectral clustering	en_US
dc.subject	Traditional clustering	en_US
dc.subject	Web technologies	en_US
dc.subject	K-means clustering	en_US
dc.title	Scaled and Projected Spectral Clustering with Vector Quantization for Handling Big Data	en_US
dc.type	Conference Paper	en_US
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: