A Novel Scalable Feature Extraction Approach for COVID-19 Protein Sequences and their Cluster Analysis with Kernelized Fuzzy Algorithm

Jha, Preeti; Tiwari, Aruna

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/9800

Full metadata record

DC Field	Value	Language
dc.contributor.author	Jha, Preeti	en_US
dc.contributor.author	Tiwari, Aruna	en_US
dc.date.accessioned	2022-05-05T15:45:01Z	-
dc.date.available	2022-05-05T15:45:01Z	-
dc.date.issued	2022	-
dc.identifier.citation	Jha, P., Tiwari, A., Bharill, N., Ratnaparkhe, M., Patel, O. P., Harshith, N., & Solasa, S. L. (2022). A novel scalable feature extraction approach for COVID-19 protein sequences and their cluster analysis with kernelized fuzzy algorithm. Paper presented at the Proceedings - 2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022, 56-59. doi:10.1109/BigComp54360.2022.00021 Retrieved from www.scopus.com	en_US
dc.identifier.isbn	978-1665421973	-
dc.identifier.other	EID(2-s2.0-85127556824)	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/9800	-
dc.identifier.uri	https://doi.org/10.1109/BigComp54360.2022.00021	-
dc.description.abstract	COVID-19 (Coronavirus Disease-19), a disease caused by the SARS-CoV-2 virus, was declared a pandemic by the World Health Organization on March 11, 2020. To solve the global problem of analysis of different variants of COVID-19 genome sequences, there is a need to develop intel-ligent, scalable machine learning techniques that can process and analyze important COVID-19 protein data by utilizing the Big Data framework. For this, we have first proposed a feature extraction approach for COVID-19 protein data named Scalable Distributed Co-occurrence-based Probability-Specific Feature extraction approach (SDCPSF). The proposed SDCPSF approach is executed on the Apache Spark cluster to preprocess the massive COVID-19 protein sequences. The proposed SDCPSF represents each variable-length COVID-19 protein sequence with fixed length six dimensions numeric feature vectors. Then the extracted features are used as input to the kernelized fuzzy clustering algorithms, i.e., KSRSIO-FCM and KSLFCM, which efficiently performs clustering of big data due to its in-memory cluster computing technique and thus forms clusters of COVID-19 genome sequences. Furthermore, the performance of KSRSIO-FCM is compared with another scalable clustering algorithm, i.e., KSLFCM, in terms of the Silhouette index (SI) and Davies-Bouldin index (DBI). © 2022 IEEE.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.source	Proceedings - 2022 IEEE International Conference on Big Data and Smart Computing, BigComp 2022	en_US
dc.subject	Big data\|Cluster analysis\|Cluster computing\|Clustering algorithms\|Data mining\|Extraction\|Feature extraction\|Fuzzy clustering\|Learning systems\|Proteins\|Apache spark cluster\|Coronavirus disease-19 protein sequence\|Coronaviruses\|Features extraction\|Fuzzy algorithms\|Genome sequences\|Kernelized fuzzy clustering\|Protein data\|Protein sequences\|World Health Organization\|Coronavirus	en_US
dc.title	A Novel Scalable Feature Extraction Approach for COVID-19 Protein Sequences and their Cluster Analysis with Kernelized Fuzzy Algorithm	en_US
dc.type	Conference Paper	en_US
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: