Please use this identifier to cite or link to this item:
https://dspace.iiti.ac.in/handle/123456789/10939
Title: | HPC enabled a Novel Deep Fuzzy Scalable Clustering Algorithm and its Application for Protein Data |
Authors: | Jha, Preeti;Tiwari, Aruna;Anand, Vaibhav K.Arya, Sudhanshu S.Singh, Tanmay P. |
Keywords: | Big data; Cluster computing; Clustering algorithms; Deep neural networks; Fuzzy inference; Fuzzy neural networks; Iterative methods; Proteins; Clusterings; Deep learning; Feature space; High-dimensional; ITS applications; Neural-networks; Performance computing; Protein data; Scalable algorithms; Scalable clustering; Fuzzy clustering |
Issue Date: | 2022 |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
Citation: | Jha, P., Tiwari, A., Bharill, N., Ratnaparkhe, M., Patel, O. P., Anand, V., . . . Singh, T. (2022). HPC enabled a novel deep fuzzy scalable clustering algorithm and its application for protein data. Paper presented at the 2022 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2022, doi:10.1109/CIBCB55180.2022.9863036 Retrieved from www.scopus.com |
Abstract: | Fuzzy clustering is a common way to divide data into groups. Even though it has been improved a lot, fuzzy clustering still has problems while clustering real high-dimensional Big Data with complicated latent distributions. To solve this problem, this study comes up with a way to represent the data in a feature space that was built from a scalable deep neural network using Apache Spark on HPC. In this paper, we proposed SDnnRSIO-FCM, a Scalable Deep Neural Network Random Sampling Iterative Optimization-FCM clustering algorithm, and the SDnnLFCM, a scalable version of the Deep Neural Network Literal Fuzzy c-Means algorithm. We focus on the design and implementation of the proposed SDnnRSIO-FCM and SDnnLFCM algorithms using the Apache Spark cluster in a High-Performance Computing (HPC) environment by representing the data in a feature space produced by the neural network to handle Big Data. First, data is mapped into new feature space to aid in the reconstruction of the original data by providing a good representation. Second, scalable fuzzy clustering is embedded with neural networks to propose deep fuzzy clustering methods. The experimental results conducted on two huge benchmark datasets show that the SDnnRSIO-FCM algorithm outperforms the SDnnLFCM algorithm in terms of Normalized Mutual Information (NMI), Adjusted Rand Index (ARI), and F-score. Furthermore, the proposed SDnnRSIO-FCM applied to huge soybean protein sequences in comparison with SDnnLFCM shows a significant improvement in terms of Silhouette index (SI), Davies-Bouldin index (DBI), and Calinski-Harabasz index (CHI). © 2022 IEEE. |
URI: | https://doi.org/10.1109/CIBCB55180.2022.9863036 https://dspace.iiti.ac.in/handle/123456789/10939 |
ISBN: | 978-1665484626 |
Type of Material: | Conference Paper |
Appears in Collections: | Department of Computer Science and Engineering |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Altmetric Badge: