ABSTRACT

A web crawler is a program that automatically surfs the Internet looking for links. It then follows each link and retrieves documents that in turn have links, recursively retrieving all further documents that are referenced. Web crawlers are sometimes referred to as web wanderers, web robots, or spiders. These names give the impression that the software itself moves between sites, although this is not the case. A crawler simply visits sites by requesting documents from them, and then automatically visits the links on those documents.