ABSTRACT

The volume of machine-readable text is growing exponentially. In news media, the government, medicine, law, and other fields, machine-readable texts or abstracts have been stockpiled for decades. Wire and news services make text available in real-time across worldwide networks. This information explosion has reached proportions in which institutions and busy professionals are unable to read and incorporate the unanalyzed glut of news, memos, and articles. This chapter explores the problem of computationally asking questions about text and of text. Current text retrieval systems are based on pattern matching of key words to the text words. However, these methods have proven to be inaccurate, with hit rates as low as 20% recall (Blair & Maron, 1985). Experienced users express dissatisfaction with these offerings in several areas: the types of questions that can be asked, the representation language of the query, and the relevance of replies. In the typical key word-based text retrieval system, the user begins a session with few key words and is inundated with irrelevant references. As the user adds more key words, the number of documents that contain all of the key words very rapidly declines to zero in a “cross-section” effect.