Please use this identifier to cite or link to this item:
https://dspace.iiti.ac.in/handle/123456789/4648
Title: | N-gram approach for a URL similarity measure |
Authors: | Chaudhari, Narendra S. |
Keywords: | Learning systems;Text processing;Websites;Cosine;Distance measure;Jaccard distance;Misclassification error;N-grams;Similarity measure;Topic Classification;Web page classification;Distance measurement |
Issue Date: | 2017 |
Publisher: | Institute of Electrical and Electronics Engineers Inc. |
Citation: | Singh, N., & Chaudhari, N. S. (2017). N-gram approach for a URL similarity measure. Paper presented at the India International Conference on Information Processing, IICIP 2016 - Proceedings, doi:10.1109/IICIP.2016.7975313 |
Abstract: | This work addresses the problem of URL topic classification by making use of the text of Uniform Resource Locators (URLs). We have introduced a method for classifying the web pages into topics by extending the Jaccard distance measure and using the n-gram approach. We have also compared our method with the best performing known distance measures for Boolean data in the literature i.e. Jaccard, Dice and Cosine distance measures. The proposed method achieves a significant reduction of 3-7% in the misclassification error rate of the URLs over the Jaccard, Dice and Cosine distance measures. © 2016 IEEE. |
URI: | https://doi.org/10.1109/IICIP.2016.7975313 https://dspace.iiti.ac.in/handle/123456789/4648 |
ISBN: | 9781467369848 |
Type of Material: | Conference Paper |
Appears in Collections: | Department of Computer Science and Engineering |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.
Altmetric Badge: