ABSTRACT

This chapter is concerned with the perception and comprehension of real-world scenes. Comprehension requires not only that the creatures and objects comprising the scene be identified but also that the relations among these entities be specified. No more than five classes of relations may be needed to characterize the difference in organization between a well-formed scene and an array of unrelated objects. The first two, support and interposition, reflect the general physical constraints—that most objects do not float in air and that an opaque object will occlude the contours of an object behind it. The third, probability, refers to the likelihood of a given object (e.g., a bassinet) being in a given scene (e.g., a service station). Fourth, objects that are likely to occur in a given scene (e.g., a gas pump in a service station) often occupy specific positions (e.g., not on a car). Fifth is the familiar size of objects. Thus, cups are not bigger than stoves. Although support and interposition can be specified without knowing what the object is, the other relations require access to the referential meaning or semantics of the object and its context.

A schema of a scene is taken to be the overall internal representation of the scene that integrates the scene’s entities and relations and allows access to semantic information. Effects attributable to the semantic relations are taken as an operational definition of schema activation. Several accounts of perception hold that a schema is activated only after the physical relations are specified and the objects identified. Seven experiments exploring this issue are described. The experimental technique employed stimuli in which an object in a scene was displaced to another part of the scene or put in another scene so as to violate 214one to three of the five relations. Such objects appear to be floating, passing through the background unlikely to be in the scene, unlikely to be in a given position in the scene, or too large or too small for the scene. The experiments measured the effects of these violations on the speed and accuracy of (1) detecting an object undergoing a violation, or (2) detecting the presence of the violation itself.

Contrary to the physical-then-semantic-relations view of scene perception, the results indicate that semantic relations are accessed at least as fast as physical relations—fast enough, in fact, for violations of the semantic relations to affect the perception of objects. Extensive semantic processing of a scene can be readily achieved from a single fixation. It is thus unnecessary to postulate eye-fixation sequences or motion to explain scene perception. Routes to the initial elicitation of a schema through scene-emergent features and the probabilistic relations among objects are discussed. The mechanisms for gaining semantic access to a real-world scene can be triggered so quickly and efficiently that conditions can readily be found in which an expectancy for a scene or familiarity with it are neither necessary nor even helpful toward its perception.