ABSTRACT

Multilayer corpora challenge several theoretical notions in corpus design, such as the definitions of words, tokens and types; primary text versus annotation; issues in document and sub-corpus metadata; and the limits of graph-based architectures for corpus representation. This chapter focuses on conceptual issues in multilayer corpus design, defining the building blocks of complex architectures. The chapter surveys token and token-annotation-based designs, overlapping and conflicting annotation spans, hierarchical corpus structures such as syntax trees, as well as pointing relations and features structures, which define potentially cyclic annotation structures.