A Survey of Geometric Vision

doi:10.1201/9781315220352-25

ABSTRACT

Vision is one of the most powerful sensing modalities. In robotics, machine vision techniques have been extensively used in applications such as manufacturing, visual servoing [7, 30], navigation [9, 26, 50, 51], and robotic mapping [58]. Here the main problem is how to reconstruct both the pose of the camera and the three-dimensional (3-D) structure of the scene. This reconstruction inevitably requires a good understanding of the geometry of image formation and 3-D reconstruction. In this chapter, we provide a survey of the basic theory and some recent advances in the geometric aspects of the reconstruction problem. Specifically, we introduce the theory and algorithms for reconstruction from two views (e.g., see [29, 31, 33, 40, 67]), multiple views (e.g., see [10, 12, 23, 37, 38, 40]), and a single view (e.g., see [1, 3, 19, 25, 28, 70, 73, 74]). Since this chapter can only provide a brief introduction to these topics, the reader is referred to the book [40] for a more comprehensive treatment. Without any knowledge of the environment, reconstruction of a scene requires multiple images. This is because a single image is merely a 2-D projection of the 3-D world, for which the depth information is lost. When multiple images are available from different known viewpoints, the 3-D location of every point in the scene can be determined uniquely by triangulation (or stereopsis). However, in many applications (especially those for robot vision), the viewpoints are also unknown. Therefore, we need to recover both the scene structure

and the camera poses. In computer vision literature, this is referred to as the “structure from motion” (SFM) problem. To solve this problem, the theory of multiple-view geometry has been developed (e.g., see [10, 12, 23, 33, 37, 38, 40, 67]). In this chapter, we introduce the basic theory of multiple-view geometry and show how it can be used to develop algorithms for reconstruction purposes. Specifically, for the twoview case, we introduce in Section 22.2 the epipolar constraint and the eight-point structure from motion algorithm [29, 33, 40]. For the multiple-view case, we introduce in Section 22.3 the rank conditions on multiple-view matrix [27, 37, 38, 40] and a multiple-view factorization algorithm [37, 40].