Please use this identifier to cite or link to this item:
https://dspace.iiti.ac.in/handle/123456789/4565
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Choudhary, Ajay K. | en_US |
dc.contributor.author | Jha, Preeti | en_US |
dc.contributor.author | Tiwari, Aruna | en_US |
dc.contributor.author | Bharill, Neha | en_US |
dc.date.accessioned | 2022-03-17T01:00:00Z | - |
dc.date.accessioned | 2022-03-17T15:34:51Z | - |
dc.date.available | 2022-03-17T01:00:00Z | - |
dc.date.available | 2022-03-17T15:34:51Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | Choudhary, A., Jha, P., Tiwari, A., Bharill, N., & Ratnaparkhe, M. (2021). Scalable fuzzy clustering-based regression to predict the isoelectric points of the plant protein sequences using apache spark. Paper presented at the IEEE International Conference on Fuzzy Systems, , 2021-July doi:10.1109/FUZZ45933.2021.9494447 | en_US |
dc.identifier.isbn | 9781665444071 | - |
dc.identifier.issn | 1098-7584 | - |
dc.identifier.other | EID(2-s2.0-85114687205) | - |
dc.identifier.uri | https://doi.org/10.1109/FUZZ45933.2021.9494447 | - |
dc.identifier.uri | https://dspace.iiti.ac.in/handle/123456789/4565 | - |
dc.description.abstract | Learning in non-stationary environments require modern tools and algorithms to quickly adapt to the new pattern because concept drift can change the underlying distribution. So, the existing assumption that the data is independent and identically distributed may be invalid in data stream scenarios. Given the massive volume of high-speed data streams and the concept drift, traditional machine learning algorithms must be self-adapting. One of the difficulties in handling regression tasks is the complexities of equations for the regression models when combined with drift handling techniques. The high dimensional protein data is a major challenge for bioinformatics researchers to analyse the dynamics of the sequences. This paper proposes a Scalable Fuzzy Clustering induced Regression (SFC-R) algorithm to predict the isoelectric point of the plant protein sequences using Apache Spark clusters. The SFC-R algorithm uses the input features extracted from the plant protein sequences and validates performance in terms of mean squared error (MAE) and root-mean-square error (RMSE). Experiments on plant protein datasets are carried out to validate the high accuracy and robustness of our approach. © 2021 IEEE. | en_US |
dc.language.iso | en | en_US |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | en_US |
dc.source | IEEE International Conference on Fuzzy Systems | en_US |
dc.subject | Bioinformatics | en_US |
dc.subject | Clustering algorithms | en_US |
dc.subject | Data streams | en_US |
dc.subject | Fuzzy clustering | en_US |
dc.subject | Fuzzy systems | en_US |
dc.subject | Machine learning | en_US |
dc.subject | Mean square error | en_US |
dc.subject | Proteins | en_US |
dc.subject | Regression analysis | en_US |
dc.subject | Handling technique | en_US |
dc.subject | High-dimensional | en_US |
dc.subject | Iso-electric points | en_US |
dc.subject | Mean squared error | en_US |
dc.subject | Non-stationary environment | en_US |
dc.subject | Regression model | en_US |
dc.subject | Root mean square errors | en_US |
dc.subject | Underlying distribution | en_US |
dc.subject | Learning algorithms | en_US |
dc.title | Scalable Fuzzy Clustering-based Regression to Predict the Isoelectric Points of the Plant Protein Sequences using Apache Spark | en_US |
dc.type | Conference Paper | en_US |
Appears in Collections: | Department of Computer Science and Engineering |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Altmetric Badge: