Design of a Task-Oriented Hidden Web Crawler  

Abstract Category: Engineering
Course / Degree: Ph.D.
Institution / University: Maharishi Dayanand University, Rohtak, India
Published in: 2009

Dissertation Abstract / Summary:

The World Wide Web (WWW), the largest and most frequently accessed public repository of information ever developed, contains large number of web pages interconnected through hyperlinks. The WWW can be divided into two parts: Surface Web and Deep Web. The Surface Web refers to the static Web pages that can be crawled and indexed by popular search engines, also termed as Publically Indexable Web (PIW). On the other hand, the Deep Web refers to the contents stored in Web databases and published by dynamic Web pages wherein people access web databases through specified query interfaces.

Infact, there are more than 300,000 Deep Web databases and 450,000 query interfaces available in the hidden web and the two figures are still increasing quickly. Besides the scale of Web databases, the contents in Web databases span well across all topics ranging from agriculture to nuclear domain. Some Deep Web portal services provide Deep Web directories that classify Web databases in some taxonomies, contain large amount of high quality information. However, these sites hidden behind search interfaces can not be crawled by traditional crawlers. Infact, crawling hidden Web is a very challenging problem especially because of following two fundamental reasons:

·        Access to these databases is provided only through restricted search interfaces, intended to be filled manually.

·        Besides the access through search interfaces, the shear size of the hidden web is too large i.e. about 400 to 500 times larger than the size of the Surface Web. As a result, it is not prudent to attempt comprehensive coverage of the hidden Web and therefore, there is need to develop a domain-specific crawler for hidden Web.

In this thesis, design and development of a novel framework for an Extensible and Scalable Domain-Specific (Task-Oriented) Hidden Web Crawler (DSHWC) is being reported. It is not only capable of crawling the hidden web but can also efficiently deal with databases hidden behind search interfaces containing single and multiple attributes as well.

Dissertation Keywords/Search Tags:
hidden web, crawler, search engine

This Dissertation Abstract may be cited as follows:
No user preference. Please use the standard reference methodology.

Submission Details: Dissertation Abstract submitted by Komal Kumar Bhatia from India on 31-Mar-2011 10:22.
Abstract has been viewed 3640 times (since 7 Mar 2010).

Komal Kumar Bhatia Contact Details: Email: komal-bhatia1@rediffmail.com

Great care has been taken to ensure that this information is correct, however ThesisAbstracts.com cannot accept responsibility for the contents of this Dissertation abstract titled "Design of a Task-Oriented Hidden Web Crawler". This abstract has been submitted by Komal Kumar Bhatia on 31-Mar-2011 10:22. You may report a problem using the contact form.
© Copyright 2003 - 2024 of ThesisAbstracts.com and respective owners.

Copyright © Thesis Abstract | Dissertation Abstracts Thesis Library 2003-2024.
by scope.com.mt @ website design