HPC enabled a Novel Deep Fuzzy Scalable Clustering Algorithm and its Application for Protein Data

Jha, Preeti;Tiwari, Aruna;Anand, Vaibhav K.Arya, Sudhanshu S.Singh, Tanmay P.

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/10939

Title:	HPC enabled a Novel Deep Fuzzy Scalable Clustering Algorithm and its Application for Protein Data
Authors:	Jha, Preeti;Tiwari, Aruna;Anand, Vaibhav K.Arya, Sudhanshu S.Singh, Tanmay P.
Keywords:	Big data; Cluster computing; Clustering algorithms; Deep neural networks; Fuzzy inference; Fuzzy neural networks; Iterative methods; Proteins; Clusterings; Deep learning; Feature space; High-dimensional; ITS applications; Neural-networks; Performance computing; Protein data; Scalable algorithms; Scalable clustering; Fuzzy clustering
Issue Date:	2022
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Citation:	Jha, P., Tiwari, A., Bharill, N., Ratnaparkhe, M., Patel, O. P., Anand, V., . . . Singh, T. (2022). HPC enabled a novel deep fuzzy scalable clustering algorithm and its application for protein data. Paper presented at the 2022 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2022, doi:10.1109/CIBCB55180.2022.9863036 Retrieved from www.scopus.com
Abstract:	Fuzzy clustering is a common way to divide data into groups. Even though it has been improved a lot, fuzzy clustering still has problems while clustering real high-dimensional Big Data with complicated latent distributions. To solve this problem, this study comes up with a way to represent the data in a feature space that was built from a scalable deep neural network using Apache Spark on HPC. In this paper, we proposed SDnnRSIO-FCM, a Scalable Deep Neural Network Random Sampling Iterative Optimization-FCM clustering algorithm, and the SDnnLFCM, a scalable version of the Deep Neural Network Literal Fuzzy c-Means algorithm. We focus on the design and implementation of the proposed SDnnRSIO-FCM and SDnnLFCM algorithms using the Apache Spark cluster in a High-Performance Computing (HPC) environment by representing the data in a feature space produced by the neural network to handle Big Data. First, data is mapped into new feature space to aid in the reconstruction of the original data by providing a good representation. Second, scalable fuzzy clustering is embedded with neural networks to propose deep fuzzy clustering methods. The experimental results conducted on two huge benchmark datasets show that the SDnnRSIO-FCM algorithm outperforms the SDnnLFCM algorithm in terms of Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and F-score. Furthermore, the proposed SDnnRSIO-FCM applied to huge soybean protein sequences in comparison with SDnnLFCM shows a significant improvement in terms of Silhouette index (SI), Davies-Bouldin index (DBI), and Calinski-Harabasz index (CHI). © 2022 IEEE.
URI:	https://doi.org/10.1109/CIBCB55180.2022.9863036 https://dspace.iiti.ac.in/handle/123456789/10939
ISBN:	978-1665484626
Type of Material:	Conference Paper
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show full item record

Altmetric Badge: