ABSTRACT

In this chapter we explore patterns of coherence relations between multiple utterances and clauses that coalesce to form a ‘tree of sentences’ out of each individual document in a corpus. After presenting three theoretical frameworks used for discourse relation analysis, as well as issues in the representation of discourse relations in corpus architecture, the chapter focuses on corpora annotated in the framework of Rhetorical Structure Theory. We begin a first case study by looking at the distribution of inter-sentential discourse relations such as ‘concession’, ‘justification’ or ‘contrast’ across genres and examine recurring properties of discourse graph topology in multiple documents. The second case study in the chapter re-examines proposals from Veins Theory, which postulates constraints on referential accessibility (e.g. ability to refer back to a phrase using a pronoun) based on discourse graph topology. By developing a quantitative predictive model of accessibility, we are able to produce ‘heat maps’ predicting the likely locus of previous mentions for pronominal anaphora, lexical phrases and bridging relations. Finally in a third study, an approach using artificial neural networks is adopted to identify the occurrence of discourse markers and other discourse relation signals. Tying into the existing literature on discourse relation signaling, a recurrent neural network (RNN) learns to score contiguous and discontinuous sequences of word as discourse relation signals, using deep learning techniques on information from distributional semantics and categorical annotations.