ABSTRACT

Many different applications in the field of computer vision (CV) require the robust identification and tracking of distinctive feature points in monocular image sequences acquired by a moving camera. Prominent examples of such applications are 3D scene modelling following the structurefrom-motion (SfM) principle or the simultaneous localisation and mapping (SLAM) for mobile robot applications. The general procedure of feature point tracking can be subdivided in two distinctive phases:

• Detection - The first stage is the identification of a set of distinctive point features kX = {x1,…,xn} with xi = (x, y)T in image Ik, e.g. based on computing the cornerness of each pixel (see [10]). At this stage each feature point is typically assigned with some kind of a descriptor θ Ik i( )x

is used in the second stage for the re-identification of the feature. This descriptor could be a simple local neighbourhood of pixels around xi

As it was shown by Aufderheide et al. (2009) [1], there are many ways for a feature tracking method to fail completely or produce a non-negligible number of wrong matches. This can be clearly seen from a mathematical point of view by the fact that either the optimisation problem converges within a local minimum or not at all.