ABSTRACT

In this chapter, a practical perspective is adopted in surveying common issues in multilayer corpus design, acquisition and annotation. The chapter discusses licensing issues, scalability, crowdsourcing and class-sourcing annotations and challenges resulting from non-standard language data, including non-native and historical data. A case study in multilayer corpus construction is given in detail using the Georgetown University Multilayer corpus, a very richly annotated, class-sourced corpus, before concluding with an overview of issues in corpus format representations, merging data from multiple formats and search and visualization in multilayer data.