Information extraction from semi-structured and un-structured documents using probabilistic context free grammar inference

Chaudhari, Narendra S.

Please use this identifier to cite or link to this item: https://dspace.iiti.ac.in/handle/123456789/4773

Full metadata record

DC Field	Value	Language
dc.contributor.author	Chaudhari, Narendra S.	en_US
dc.date.accessioned	2022-03-17T01:00:00Z	-
dc.date.accessioned	2022-03-17T15:35:26Z	-
dc.date.available	2022-03-17T01:00:00Z	-
dc.date.available	2022-03-17T15:35:26Z	-
dc.date.issued	2012	-
dc.identifier.citation	Thakur, R., Jain, S., Chaudhari, N. S., & Singhai, R. (2012). Information extraction from semi-structured and un-structured documents using probabilistic context free grammar inference. Paper presented at the Proceedings - 2012 International Conference on Information Retrieval and Knowledge Management, CAMP'12, 273-276. doi:10.1109/InfRKM.2012.6204988	en_US
dc.identifier.isbn	9781467310901	-
dc.identifier.other	EID(2-s2.0-84863088952)	-
dc.identifier.uri	https://doi.org/10.1109/InfRKM.2012.6204988	-
dc.identifier.uri	https://dspace.iiti.ac.in/handle/123456789/4773	-
dc.description.abstract	Large number of research papers are available in the form of un-structured (text) format. Knowledge discovery in un-structured document has been recognized as promising task. These documents are typically formatted for human viewing, which varies widely from document to document. Frequent change in their formatting causes difficulties in constructing a global schema. Thus, discovery of interesting rules from it is a complex and tedious process. Recently, conditional random fields (CRFs) and hand-coded wrappers have been used to label the text (such as Title, Author Name(s), Affiliation, Email, Contact number, etc. in research papers). In this paper we propose a novel hybrid approach to infer grammar rules using alignment similarity and probabilistic context free grammar. It helps in extracting desired information from the document. © 2012 IEEE.	en_US
dc.language.iso	en	en_US
dc.source	Proceedings - 2012 International Conference on Information Retrieval and Knowledge Management, CAMP'12	en_US
dc.subject	Conditional random fields (CRFs)	en_US
dc.subject	Global schemas	en_US
dc.subject	Grammar inference	en_US
dc.subject	Grammar rules	en_US
dc.subject	Hybrid approach	en_US
dc.subject	Information Extraction	en_US
dc.subject	Interesting rules	en_US
dc.subject	Probabilistic context free grammars	en_US
dc.subject	Research papers	en_US
dc.subject	Semi-structured	en_US
dc.subject	Sequence mining	en_US
dc.subject	Alignment	en_US
dc.subject	Data mining	en_US
dc.subject	Information retrieval	en_US
dc.subject	Knowledge management	en_US
dc.subject	Learning systems	en_US
dc.subject	Context free grammars	en_US
dc.title	Information extraction from semi-structured and un-structured documents using probabilistic context free grammar inference	en_US
dc.type	Conference Paper	en_US
Appears in Collections:	Department of Computer Science and Engineering

Files in This Item:

There are no files associated with this item.

Show simple item record

Altmetric Badge: