ABSTRACT

Previous work in visual cognition has extensively explored the power of parts-based representations of objects for recognition, categorization, and functional reasoning. We propose a novel, parts-based representation of objects, where the parts of an object are found by grouping together object elements that move together over a set of images. The distribution of object configurations is then succinctly described in terms of these functional parts and an orthogonal set of modal transformations of these parts. If the distribution has a natural set of principal axes, the computed modes are stable and functionally significant. Moreover, the representation is always unique and robustly computable because it does not rely critically on the properties of any particular element in any particular instance of the object. Most importantly, the representation provides a set of direct cues to object functionality without making any assumptions about object geometry or invoking any high-level domain knowledge. This robustness and functional transparency may be contrasted with standard representations based on geometric parts, such as generalized cylinders (Marr and Nish-ihara, 1978) or geons (Biederman, 1987), which are sensitive to accidental alignments and occlusions (Biederman, 1987), and which only support functional reasoning in conjunction with high-level domain knowledge (Tversky and Hemenway, 1984).