ABSTRACT

This paper summarizes research in progress on extracting socio-cultural content from language. Unlike prior work in information extraction from text, where only explicitly stated facts are the goal, in this paper the goal is to infer properties and facts from language use, even though the properties and facts may never be stated explicitly. Further, the focus is on interactions among informal groups; prior work on information extraction has focused on third party reporting, e.g., news. The paper covers corpus collection, human assessments, and evaluation methodology. The study is underway in English and in Arabic.