Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/13207
Full metadata record
DC FieldValueLanguage
dc.contributor.authorDwivedi, Rajeshen_US
dc.contributor.authorTiwari, Arunaen_US
dc.contributor.authorTripathi, Abhisheken_US
dc.date.accessioned2024-02-21T06:31:04Z-
dc.date.available2024-02-21T06:31:04Z-
dc.date.issued2023-
dc.identifier.citationDwivedi, R., Tiwari, A., Bharill, N., Ratnaparkhe, M., Tripathi, A., & Jha, P. (2023). A Novel Feature Extraction Approach for the Clustering and Classification of Genome Sequences. 2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023. Scopus. https://doi.org/10.1109/SSCI52147.2023.10372047en_US
dc.identifier.isbn978-1665430654-
dc.identifier.otherEID(2-s2.0-85182948315)-
dc.identifier.urihttps://doi.org/10.1109/SSCI52147.2023.10372047-
dc.identifier.urihttps://dspace.iiti.ac.in/handle/123456789/13207-
dc.description.abstractFeature extraction is essential in bioinformatics because it transforms genome sequences into the feature vectors required for data mining activities such as classification and clustering. The data mining activities enable us to classify or cluster the newly sequenced genome to the known families. Nowadays, a variety of feature extraction strategies are available for genome data. Nevertheless, several existing algorithms do not extract context-sensitive key properties, also some approaches extract features, which are unable to distinguish between two non-similar sequences. In addition, the efficacy of existing feature extraction techniques is evaluated on either supervised or unsupervised learning models, but not on both. Thus, an efficient feature extraction technique that extracts significantly relevant features from genome sequences is required. In this paper, a novel feature extraction method is proposed that extracts features based on the length of the sequence, the frequency of nucleotide bases, the modified positional sum of nucleotide bases, the distribution of nucleotide bases, and the entropy of the sequence to generate a 14-dimensional fixed-length numeric vector to describe each genome sequence uniquely. By applying extracted features to both supervised and unsupervised machine learning approaches, the performance of the proposed feature extraction method is assessed. The experimental results show that the proposed strategy for clustering and classifying novel genome sequences into recognized genome classes is highly effective and efficient. The same is proven by comparing the proposed method to the standard state-of-the-art method. © 2023 IEEE.en_US
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.source2023 IEEE Symposium Series on Computational Intelligence, SSCI 2023en_US
dc.subjectClassificationen_US
dc.subjectClusteringen_US
dc.subjectFeature extractionen_US
dc.subjectGenome sequencesen_US
dc.subjectSingle nucleotide polymorphismen_US
dc.titleA Novel Feature Extraction Approach for the Clustering and Classification of Genome Sequencesen_US
dc.typeConference Paperen_US
Appears in Collections:Department of Computer Science and Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: