Scalable Fuzzy Clustering-based Regression to Predict the Isoelectric Points of the Plant Protein Sequences using Apache Spark

Choudhary, Ajay K.; Jha, Preeti; Tiwari, Aruna; Bharill, Neha

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/4565

Title:	Scalable Fuzzy Clustering-based Regression to Predict the Isoelectric Points of the Plant Protein Sequences using Apache Spark
Authors:	Choudhary, Ajay K. Jha, Preeti Tiwari, Aruna Bharill, Neha
Keywords:	Bioinformatics;Clustering algorithms;Data streams;Fuzzy clustering;Fuzzy systems;Machine learning;Mean square error;Proteins;Regression analysis;Handling technique;High-dimensional;Iso-electric points;Mean squared error;Non-stationary environment;Regression model;Root mean square errors;Underlying distribution;Learning algorithms
Issue Date:	2021
Publisher:	Institute of Electrical and Electronics Engineers Inc.
Citation:	Choudhary, A., Jha, P., Tiwari, A., Bharill, N., & Ratnaparkhe, M. (2021). Scalable fuzzy clustering-based regression to predict the isoelectric points of the plant protein sequences using apache spark. Paper presented at the IEEE International Conference on Fuzzy Systems, , 2021-July doi:10.1109/FUZZ45933.2021.9494447
Abstract:	Learning in non-stationary environments require modern tools and algorithms to quickly adapt to the new pattern because concept drift can change the underlying distribution. So, the existing assumption that the data is independent and identically distributed may be invalid in data stream scenarios. Given the massive volume of high-speed data streams and the concept drift, traditional machine learning algorithms must be self-adapting. One of the difficulties in handling regression tasks is the complexities of equations for the regression models when combined with drift handling techniques. The high dimensional protein data is a major challenge for bioinformatics researchers to analyse the dynamics of the sequences. This paper proposes a Scalable Fuzzy Clustering induced Regression (SFC-R) algorithm to predict the isoelectric point of the plant protein sequences using Apache Spark clusters. The SFC-R algorithm uses the input features extracted from the plant protein sequences and validates performance in terms of mean squared error (MAE) and root-mean-square error (RMSE). Experiments on plant protein datasets are carried out to validate the high accuracy and robustness of our approach. © 2021 IEEE.
URI:	https://doi.org/10.1109/FUZZ45933.2021.9494447 https://dspace.iiti.ac.in/handle/123456789/4565
ISBN:	9781665444071
ISSN:	1098-7584
Type of Material:	Conference Paper
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show full item record

Altmetric Badge: