ABSTRACT

Image-based techniques for object recognition have recently been developed to recognize a specific three-dimensional object after a ‘learning’ stage, in which a few two-dimensional views of the object are used as training examples (Poggio and Edelman, 1990; Edelman and Poggio, 1992). A theoretical lower bound on the number of views is provided by the 1.5-views theorem (Poggio, 1990; Ullman and Basri, 1991; for more details see Section 2.1 in this paper). In the orthographic case, this theorem implies that two views-defined in terms of pointwise features-are sufficient for recognition or equivalent to define the affine structure of an object (see also Koenderink and van Doorn, 1991). It is known that, in the case of perspective

projection, two views are sufficient to compute projective invariants specific to the object (Faugeras, 1992; Hartley et al., 1992; Shashua, 1993). Under more general conditions (more general definition of ‘view’, non-uniform transformations, etc.) and, depending on the implementation, many more views may be required (Poggio and Edelman’s estimate is of the order of 100 for the whole viewing sphere using their approximation network).