Scalable Fuzzy Clustering-based Regression to Predict the Isoelectric Points of the Plant Protein Sequences using Apache Spark

Choudhary, Ajay K.; Jha, Preeti; Tiwari, Aruna; Bharill, Neha

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/4565

Full metadata record

DC Field	Value	Language
dc.contributor.author	Choudhary, Ajay K.	en_US
dc.contributor.author	Jha, Preeti	en_US
dc.contributor.author	Tiwari, Aruna	en_US
dc.contributor.author	Bharill, Neha	en_US
dc.date.accessioned	2022-03-17T01:00:00Z	-
dc.date.accessioned	2022-03-17T15:34:51Z	-
dc.date.available	2022-03-17T01:00:00Z	-
dc.date.available	2022-03-17T15:34:51Z	-
dc.date.issued	2021	-
dc.identifier.citation	Choudhary, A., Jha, P., Tiwari, A., Bharill, N., & Ratnaparkhe, M. (2021). Scalable fuzzy clustering-based regression to predict the isoelectric points of the plant protein sequences using apache spark. Paper presented at the IEEE International Conference on Fuzzy Systems, , 2021-July doi:10.1109/FUZZ45933.2021.9494447	en_US
dc.identifier.isbn	9781665444071	-
dc.identifier.issn	1098-7584	-
dc.identifier.other	EID(2-s2.0-85114687205)	-
dc.identifier.uri	https://doi.org/10.1109/FUZZ45933.2021.9494447	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/4565	-
dc.description.abstract	Learning in non-stationary environments require modern tools and algorithms to quickly adapt to the new pattern because concept drift can change the underlying distribution. So, the existing assumption that the data is independent and identically distributed may be invalid in data stream scenarios. Given the massive volume of high-speed data streams and the concept drift, traditional machine learning algorithms must be self-adapting. One of the difficulties in handling regression tasks is the complexities of equations for the regression models when combined with drift handling techniques. The high dimensional protein data is a major challenge for bioinformatics researchers to analyse the dynamics of the sequences. This paper proposes a Scalable Fuzzy Clustering induced Regression (SFC-R) algorithm to predict the isoelectric point of the plant protein sequences using Apache Spark clusters. The SFC-R algorithm uses the input features extracted from the plant protein sequences and validates performance in terms of mean squared error (MAE) and root-mean-square error (RMSE). Experiments on plant protein datasets are carried out to validate the high accuracy and robustness of our approach. © 2021 IEEE.	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.source	IEEE International Conference on Fuzzy Systems	en_US
dc.subject	Bioinformatics	en_US
dc.subject	Clustering algorithms	en_US
dc.subject	Data streams	en_US
dc.subject	Fuzzy clustering	en_US
dc.subject	Fuzzy systems	en_US
dc.subject	Machine learning	en_US
dc.subject	Mean square error	en_US
dc.subject	Proteins	en_US
dc.subject	Regression analysis	en_US
dc.subject	Handling technique	en_US
dc.subject	High-dimensional	en_US
dc.subject	Iso-electric points	en_US
dc.subject	Mean squared error	en_US
dc.subject	Non-stationary environment	en_US
dc.subject	Regression model	en_US
dc.subject	Root mean square errors	en_US
dc.subject	Underlying distribution	en_US
dc.subject	Learning algorithms	en_US
dc.title	Scalable Fuzzy Clustering-based Regression to Predict the Isoelectric Points of the Plant Protein Sequences using Apache Spark	en_US
dc.type	Conference Paper	en_US
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: