Developing a multimodal corpus of speech acts in situated discourse |

ABSTRACT

In this toolkit description, scheme design, working definition, annotation evaluation, data representation, and possible usage of such a multimodal corpus are introduced. The formulation and implementation of illocutionary force segmentation and annotation scheme are the warp and woof for exploring live illocutionary forces. When the scheme was designed, pilot annotations would be offered for some selected corpus data based on the scheme. The study would evaluate the annotations to select the standard samples, which was a preparation for later large-scale data annotation. Then, the study conducted tests of annotation consistency, reliability, and validity, which confirmed that the annotator offered consistent annotations in both trials. The result shows that all the data and annotation in this corpus were reliable and valid for further statistical analysis.