ABSTRACT

Soumen Chakrabarti, Sujatha Das, Vijay Krishnan, and Kriti Puniyani

Until recently, large-scale text andWeb search systems regarded a document as a sequence of string tokens. Queries were also comprised of string tokens, and the search engine’s job was to assign a score to each document based on the extent of matches between query and document tokens, the rarity of the query tokens in the corpus, and, more recently, the “prestige” of the Web document in the social network of hyperlinks.