 
 
    Please use this identifier to cite or link to this item:
    
    
    https://dspace.iiti.ac.in/handle/123456789/9794
| Title: | Cube Sampled K-Prototype Clustering for Featured Data | 
| Authors: | Jain, Seemandhar Ahuja, Kapil | 
| Keywords: | K-means clustering|Principal component analysis|Clustering accuracy|Clusterings|Cube sampling|Hier-archical clustering|Hierarchical Clustering|K-means|K-prototype|K-prototype clustering|Principal-component analysis|Geometry | 
| Issue Date: | 2021 | 
| Publisher: | Institute of Electrical and Electronics Engineers Inc. | 
| Citation: | Jain, S., Shastri, A. A., Ahuja, K., Busnel, Y., & Singh, N. P. (2021). Cube sampled K-prototype clustering for featured data. Paper presented at the Proceedings of the 2021 IEEE 18th India Council International Conference, INDICON 2021, doi:10.1109/INDICON52576.2021.9691727 Retrieved from www.scopus.com | 
| Abstract: | Clustering large amount of data is becoming increasingly important in the current times. Due to the large sizes of data, clustering algorithm often take too much time. Sampling this data before clustering is commonly used to reduce this time. In this work, we propose a probabilistic sampling technique called cube sampling along with K-Prototype clustering. Cube sampling is used because of its accurate sample selection. K-Prototype is most frequently used clustering algorithm when the data is numerical as well as categorical (very common in today's time). The novelty of this work is in obtaining the crucial inclusion probabilities for cube sampling using Principal Component Analysis (PCA). Experiments on multiple datasets from the UCI repository demonstrate that cube sampled K-Prototype algorithm gives the best clustering accuracy among similarly sampled other popular clustering algorithms (K-Means, Hierarchical Clustering (HC), Spectral Clustering (SC)). When compared with unsampled K-Prototype, K-Means, HC and SC, it still has the best accuracy with the added advantage of reduced computational complexity (due to reduced data size). © 2021 IEEE. | 
| URI: | https://dspace.iiti.ac.in/handle/123456789/9794 https://doi.org/10.1109/INDICON52576.2021.9691727 | 
| ISBN: | 978-1665441759 | 
| Type of Material: | Conference Paper | 
| Appears in Collections: | Department of Computer Science and Engineering | 
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Altmetric Badge:
            	
                
    
            
