Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/4648
Full metadata record
DC FieldValueLanguage
dc.contributor.authorChaudhari, Narendra S.en_US
dc.date.accessioned2022-03-17T01:00:00Z-
dc.date.accessioned2022-03-17T15:35:03Z-
dc.date.available2022-03-17T01:00:00Z-
dc.date.available2022-03-17T15:35:03Z-
dc.date.issued2017-
dc.identifier.citationSingh, N., & Chaudhari, N. S. (2017). N-gram approach for a URL similarity measure. Paper presented at the India International Conference on Information Processing, IICIP 2016 - Proceedings, doi:10.1109/IICIP.2016.7975313en_US
dc.identifier.isbn9781467369848-
dc.identifier.otherEID(2-s2.0-85027470887)-
dc.identifier.urihttps://doi.org/10.1109/IICIP.2016.7975313-
dc.identifier.urihttps://dspace.iiti.ac.in/handle/123456789/4648-
dc.description.abstractThis work addresses the problem of URL topic classification by making use of the text of Uniform Resource Locators (URLs). We have introduced a method for classifying the web pages into topics by extending the Jaccard distance measure and using the n-gram approach. We have also compared our method with the best performing known distance measures for Boolean data in the literature i.e. Jaccard, Dice and Cosine distance measures. The proposed method achieves a significant reduction of 3-7% in the misclassification error rate of the URLs over the Jaccard, Dice and Cosine distance measures. © 2016 IEEE.en_US
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.sourceIndia International Conference on Information Processing, IICIP 2016 - Proceedingsen_US
dc.subjectLearning systemsen_US
dc.subjectText processingen_US
dc.subjectWebsitesen_US
dc.subjectCosineen_US
dc.subjectDistance measureen_US
dc.subjectJaccard distanceen_US
dc.subjectMisclassification erroren_US
dc.subjectN-gramsen_US
dc.subjectSimilarity measureen_US
dc.subjectTopic Classificationen_US
dc.subjectWeb page classificationen_US
dc.subjectDistance measurementen_US
dc.titleN-gram approach for a URL similarity measureen_US
dc.typeConference Paperen_US
Appears in Collections:Department of Computer Science and Engineering

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetric Badge: