ABSTRACT

The integration of computational models of vision and natural language processing has significant practical and theoretical consequences. This chapter describes an information retrieval task where this integration is crucial. Language can be used to describe the environment. These descriptions can be broken into two classes: one class describing the location of objects and the other describing the objects themselves. VerbalImage is a computational implementation of a theory of the description of objects and in particular the shape of objects. A small number of words are identified that denote the likewise small number of prototypical shapes in a domain. A closed class of words and phrases signals shape modification of these prototypes. As suggested by prior theories of the semantics of shape modification, words and phrases such as thin or 1 inch long are sensitive to abstracted or idealized features of the shape prototypes of the objects of the domain. These abstract features of prototype shapes are limited in number and include in addition to other features primary and secondary dimensionality, and intrinsic “up,” “front,” “left,” and “right.” In VerbalImage, natural language (NL) descriptions of physical objects are interpreted and rendered on a graphic screen. Additional NL descriptions can be used to modify the image on the graphics screen. The main contribution of this work is to show that linguistic and perceptual theories typified by Jackendoff (1991) and Biederman (1987), when appropriately modified, are useful in a computational setting. There are also new contributions in the area of subclassification.