Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/4819
Full metadata record
DC FieldValueLanguage
dc.contributor.authorJha, Preetien_US
dc.contributor.authorTiwari, Arunaen_US
dc.contributor.authorBharill, Nehaen_US
dc.contributor.authorMounika, Mukkamallaen_US
dc.date.accessioned2022-03-17T01:00:00Z-
dc.date.accessioned2022-03-17T15:35:37Z-
dc.date.available2022-03-17T01:00:00Z-
dc.date.available2022-03-17T15:35:37Z-
dc.date.issued2021-
dc.identifier.citationJha, P., Tiwari, A., Bharill, N., Ratnaparkhe, M., Mounika, M., & Nagendra, N. (2021). Apache spark based kernelized fuzzy clustering framework for single nucleotide polymorphism sequence analysis. Computational Biology and Chemistry, 92 doi:10.1016/j.compbiolchem.2021.107454en_US
dc.identifier.issn1476-9271-
dc.identifier.otherEID(2-s2.0-85102073478)-
dc.identifier.urihttps://doi.org/10.1016/j.compbiolchem.2021.107454-
dc.identifier.urihttps://dspace.iiti.ac.in/handle/123456789/4819-
dc.description.abstractThis paper introduces a kernel based fuzzy clustering approach to deal with the non-linear separable problems by applying kernel Radial Basis Functions (RBF) which maps the input data space non-linearly into a high-dimensional feature space. Discovering clusters in the high-dimensional genomics data is extremely challenging for the bioinformatics researchers for genome analysis. To support the investigations in bioinformatics, explicitly on genomic clustering, we proposed high-dimensional kernelized fuzzy clustering algorithms based on Apache Spark framework for clustering of Single Nucleotide Polymorphism (SNP) sequences. The paper proposes the Kernelized Scalable Random Sampling with Iterative Optimization Fuzzy c-Means (KSRSIO-FCM) which inherently uses another proposed Kernelized Scalable Literal Fuzzy c-Means (KSLFCM) clustering algorithm. Both the approaches completely adapt the Apache Spark cluster framework by localized sub-clustering Resilient Distributed Dataset (RDD) method. Additionally, we are also proposing a preprocessing approach for generating numeric feature vectors for huge SNP sequences and making it a scalable preprocessing approach by executing it on an Apache Spark cluster, which is applied to real-world SNP datasets taken from open-internet repositories of two different plant species, i.e., soybean and rice. The comparison of the proposed scalable kernelized fuzzy clustering results with similar works shows the significant improvement of the proposed algorithm in terms of time and space complexity, Silhouette index, and Davies-Bouldin index. Exhaustive experiments are performed on various SNP datasets to show the effectiveness of proposed KSRSIO-FCM in comparison with proposed KSLFCM and other scalable clustering algorithms, i.e., SRSIO-FCM, and SLFCM. © 2021 Elsevier Ltden_US
dc.language.isoenen_US
dc.publisherElsevier Ltden_US
dc.sourceComputational Biology and Chemistryen_US
dc.subjectBioinformaticsen_US
dc.subjectFuzzy clusteringen_US
dc.subjectFuzzy systemsen_US
dc.subjectIterative methodsen_US
dc.subjectNucleotidesen_US
dc.subjectPolymorphismen_US
dc.subjectHigh-dimensional feature spaceen_US
dc.subjectIterative Optimizationen_US
dc.subjectKernelized fuzzy clusteringen_US
dc.subjectPreprocessing approachesen_US
dc.subjectRadial Basis Function(RBF)en_US
dc.subjectResilient distributed dataseten_US
dc.subjectSingle-nucleotide polymorphismsen_US
dc.subjectTime and space complexityen_US
dc.subjectClustering algorithmsen_US
dc.subjectalgorithmen_US
dc.subjectbiologyen_US
dc.subjectcluster analysisen_US
dc.subjectfuzzy logicen_US
dc.subjectgenetic databaseen_US
dc.subjectgeneticsen_US
dc.subjecthumanen_US
dc.subjectsingle nucleotide polymorphismen_US
dc.subjectAlgorithmsen_US
dc.subjectCluster Analysisen_US
dc.subjectComputational Biologyen_US
dc.subjectDatabases, Geneticen_US
dc.subjectFuzzy Logicen_US
dc.subjectHumansen_US
dc.subjectPolymorphism, Single Nucleotideen_US
dc.titleApache Spark based kernelized fuzzy clustering framework for single nucleotide polymorphism sequence analysisen_US
dc.typeJournal Articleen_US
Appears in Collections:Department of Computer Science and Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: