ABSTRACT

Three empirical studies of coherence in large corpora of commentary text are sketched, showing that cue phrases are infrequent, and that substantive coherence relations must be assigned in order to infer discourse structure. The notion of coherence is carefully defined in relation to the world, cognitive models of the world, and formal semantic representations of discourse. An efficient algorithm for assigning discourse coherence relations is described, which employs information from syntax, cue phrases, lexical items, formal semantics and naive semantics. The algorithm correctly assigns the coherence relations evident in- an 8000 word corpus.