ABSTRACT

This chapter explains search engines, information retrieval, page classifi cation, page clustering, and microblog summarization as techniques for mining Web contents.

In order to search Web pages, the user usually consults Web search engines. Roughly speaking, tasks on the Web search engine side are divided into the following processes:

• Crawling Web pages • Content analysis and link analysis of Web pages • Indexing Web pages • Ranking Web pages • Search and query processing of Web pages

Before explaining these in detail, the fl ow of processes in a typical search engine will be reviewed briefl y (see Fig. 11.1).