ABSTRACT

This chapter explores the border area between vision and natural language with respect to a particular task: obtaining verbal descriptions of scenes with motion. The task involves image understanding as we assume that the time-varying scene to be described is represented by an image sequence. The chapter describes that visual motion analysis can lead to representations that easily map into deep case frames of natural-language utterances. The core processes of natural-language understanding and generation were not developed as part of the NAOS-project. The core of NAOS is a program that recognizes events in a geometrical scene description (GSD). A comparison of the GSD and the visualized geometrical scene description (VGSD) is the heart of speech-act planning. Events are conceptual units that are designed to capture the semantics of verbs of locomotion. Events are particular instantiations of event models. The choice of verb-oriented event models has several conse-quences which are discussed in Neumann and Novak.