A Survey of Relevance Prediction Based Focused Web Crawlers on the Web

doi:10.1201/9781003388913-60

ABSTRACT

The demand for efficient and economical crawling strategies has skyrocketed together with the rapid expansion of dynamic web contents. As a corollary, multiple novel ideas have been put forth including focused crawling, which emerged as the most relevant. The focused crawlers are useful to search for web pages that satisfy the already established notions. Because to its excellent filtering and minimal memory and processing time demands, focused crawler caught the eye of several search engines. This paper offers a survey on web crawling that is centered on relevance computation. From the literature that is currently accessible, 63 focused crawlers are grouped into three categories as classic, semantic and learning focused crawler. Each metric’s importance and impact on precision, recall, and harvest rate are examined. For the advantage of the users, future trends, bottlenecks, and solutions are also discussed.