Visual Search for Objects in a Complex Visual Context: What We Wish to See

doi:10.1201/b17080-2

ABSTRACT

This chapter addresses the problem of recognition/classification of objects in the so-called “egocentric video,” that is, one recorded by cameras worn by persons. It proposes the use of visual saliency for detecting active regions within video frames and examines an improvement of the saliency model by adding a third saliency cue called geometric. The problem of object recognition in the visual media data remains one of the most challenging tasks in the overall range of problems to be solved in order to build intelligent systems of multimedia data mining. Probably the most famous global color descriptor is the color histogram, which represents the distribution of colors within the image or a region of the image. Scale Invariant Feature Transform has proven to be a powerful feature in many computer vision applications. Scale Invariant Feature Transform has been designed to match different images or objects of a scene.