ABSTRACT

The deep Web refers to static content, including nontextual materials, that is stored in databases accessible through the Web. This content includes telephone books, dictionary definitions, library collections, Web-based auctions, news, stock reports, audio files, video files, etc. The deep Web is like a digital library-an online information environment offering access to an assortment of resources and information services. However, the deep Web is largely inaccessible to search engines because its pages do not exist until they are created dynamically via queries from such programs as Microsoft Access, Oracle, SQL (structured query language), and IBM’s DB2. This information is accessible only by query. The deep Web tends to be narrower, with deeper content than conventional sites, and is highly relevant to every information need. The deep Web is up to 550 times larger than the World Wide Web, and resides in topic-specific databases. One reason that search engines do not index the deep Web is that search technologies are limited in their capabilities despite their tremendous usefulness in helping searchers locate text documents on the Web. Another reason is that it is expensive for search engines to locate Web resources and maintain up-to-date indexes, and, thus, impractical to operate a comprehensive search engine. Search engines must also deal with unreliable information. Because practically everyone may post information to the Internet, it is highly likely that much of the information is incorrect, incomplete, or deceptive. The deep Web is the fastest-growing category of new information on the Internet. It has been asserted that the deep Web will be the dominant source of information for the next-generation Internet. The deep Web is sometimes referred to as the Invisible Web.