ABSTRACT

The perceptual principles that allow people to group visually similar objects into entities, or groups, have been called the Gestalt Laws of perception. Two well known principles of perceptual grouping are proximity and similarity: objects that lie close together are perceived to fall into groups; objects of similar shape, size or color are more likely to form groups than objects differing along these dimensions. While the primary function of these “laws” is to help us perceive the world, they also enter into our communications. People can build on assumptions about each other’s perception of the world as a basis for simplifying discourse: for example, we invariably refer to collections of objects simply by gesturing in their direction and uttering “those.” The current work describes an algorithm that simulates parts of the visual grouping mechanism at the object level. The system uses feature spaces and simple ranking methods to produce object groupings. Computational aspects of this system are described in detail and its uses for enhancing multi-modal interfaces are explained.