A Novel Feature Extraction Approach for the Clustering and Classification of Genome Sequences

Dwivedi, Rajesh; Tiwari, Aruna; Tripathi, Abhishek

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/13207

Full metadata record

DC Field	Value	Language
dc.contributor.author	Dwivedi, Rajesh	en_US
dc.contributor.author	Tiwari, Aruna	en_US
dc.contributor.author	Tripathi, Abhishek	en_US
dc.date.accessioned	2024-02-21T06:31:04Z	-
dc.date.available	2024-02-21T06:31:04Z	-
dc.date.issued	2023	-
dc.identifier.isbn	978-1665430654	-
dc.identifier.other	EID(2-s2.0-85182948315)	-
dc.identifier.uri	https://doi.org/10.1109/SSCI52147.2023.10372047	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/13207	-
dc.description.abstract	Feature extraction is essential in bioinformatics because it transforms genome sequences into the feature vectors required for data mining activities such as classification and clustering. The data mining activities enable us to classify or cluster the newly sequenced genome to the known families. Nowadays, a variety of feature extraction strategies are available for genome data. Nevertheless, several existing algorithms do not extract context-sensitive key properties, also some approaches extract features, which are unable to distinguish between two non-similar sequences. In addition, the efficacy of existing feature extraction techniques is evaluated on either supervised or unsupervised learning models, but not on both. Thus, an efficient feature extraction technique that extracts significantly relevant features from genome sequences is required. In this paper, a novel feature extraction method is proposed that extracts features based on the length of the sequence, the frequency of nucleotide bases, the modified positional sum of nucleotide bases, the distribution of nucleotide bases, and the entropy of the sequence to generate a 14-dimensional fixed-length numeric vector to describe each genome sequence uniquely. By applying extracted features to both supervised and unsupervised machine learning approaches, the performance of the proposed feature extraction method is assessed. The experimental results show that the proposed strategy for clustering and classifying novel genome sequences into recognized genome classes is highly effective and efficient. The same is proven by comparing the proposed method to the standard state-of-the-art method. © 2023 IEEE.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.source	2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023	en_US
dc.subject	Classification	en_US
dc.subject	Clustering	en_US
dc.subject	Feature extraction	en_US
dc.subject	Genome sequences	en_US
dc.subject	Single nucleotide polymorphism	en_US
dc.title	A Novel Feature Extraction Approach for the Clustering and Classification of Genome Sequences	en_US
dc.type	Conference Paper	en_US
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: