ABSTRACT

Information extraction (IE) is the process of scanning text for information relevant to some interest, including extracting entities, relations, and, most challenging, events-or who did what to whom, when, and where. It requires deeper analysis than keyword searches, but its aims fall short of the very hard and long-termproblemof text understanding, wherewe seek to capture all the information in a text, alongwith the speaker’s or writer’s intention. IE represents a midpoint on this spectrum, where the aim is to capture structured informationwithout sacrificing feasibility. IE typically focuses on surface linguistic phenomena that do not require deep inference, and it focuses on the phenomena that are most frequent in texts.