A survey in semantic web technologies-inspired focused crawlers
|dc.contributor.author||Hussain, Farookh Khadeer|
|dc.identifier.citation||Dong, Hai and Hussain, Farookh Khadeer and Chang, Elizabeth. 2008. A survey in semantic web technologies-inspired focused crawlers, in Shoniregun, C.A. (ed), Third IEEE International Conference on Digital Information Management, Nov 13 2008, pp. 934-936. London, UK: Institute of Electrical and Electronics Engineers (IEEE).|
Crawlers are software which can traverse the internet and retrieve webpages by hyperlinks. In theface of the inundant spam websites, traditional web crawlers cannot function well to solve this problem.Semantic focused crawlers utilize semantic web technologies to analyze the semantics of hyperlinksand web documents. This paper briefly reviews the recent studies on one category of semantic focusedcrawlers ? ontology-based focused crawlers, which are a series of crawlers that utilize ontologies to linkthe fetched web documents with the ontological concepts (topics). The purpose of this is to organizeand categorize web documents, or filtering irrelevant webpages with regards to the topics. A briefcomparison are made among these crawlers, from six perspectives - domain, working environment,special functions, technologies utilized, evaluation metrics and evaluation results. The conclusion withrespect to this comparison is made in the final section.
|dc.publisher||Institute of Electrical and Electronics Engineers (IEEE)|
|dc.title||A survey in semantic web technologies-inspired focused crawlers|
|dcterms.source.title||Proceedings of the 3rd IEEE international conference on digital information management (ICDIM 2008)|
|dcterms.source.series||Proceedings of the 3rd IEEE international conference on digital information management (ICDIM 2008)|
|dcterms.source.conference||3rd IEEE International Conference on Digital Information Management (ICDIM 2008)|
|dcterms.source.conference-start-date||13 Nov 2008|
Copyright © 2008 IEEE This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
|curtin.department||Centre for Extended Enterprises and Business Intelligence|
|curtin.faculty||Curtin Business School|
|curtin.faculty||School of Information Systems|