ABSTRACT

Modeling a scene background and finding the discrepancies between each new frame and the model to determine the foreground objects, i.e. background subtraction, is a common approach to detecting moving objects in a video scene and it is a critical component of many video object tracking systems. Segmentation of the foreground objects from the scene background is a challenging task, especially with the dynamically changing nature of most real-world scene backgrounds due to variations in illumination, dynamic scene elements such as wind blown trees, water ripples and reflections, etc. Many methods have been proposed in literature to model a video scene background, from modeling the image pixels as a Mixture of Gaussians (MOG) [36], to the use of non-parametric kernel density estimation or artificial neural network based approaches for background subtraction. In [36], where the MOG approach was proposed, the number of mixture components was kept fixed to an experimentally determined value. Various updates and refinements have been proposed for this method. In [20] the MOG model is improved by using a faster initialization and updating process and introducing a chromatic color space based shadow removal step. The number of mixture components, up to a maximum value of 4, is estimated using Dirichlet

19-1

priors in [43]. Alternatively a non-parametric kernel density estimation (KDE) approach for background subtraction, which adapts both short-term and long-term models to handle quick and slow changes in background was proposed in [8]. The local texture around a pixel is taken into account in [16] where the pixels are modeled as adaptive local binary pattern histograms calculated over a circular area around the pixels. A recent approach that takes the neighborhood pixel values into account to build a background model was proposed in [1]. In [1] the foreground pixel determination is treated as a classification process and the background pixels are not modeled with a probability density function. A random sampling strategy is used to update the exemplars for each pixel. This method is not sensitive to small camera displacements, noise or ghost objects. The algorithm presented in [25] takes a self organizing approach for background subtraction based on artificial neural networks. Each pixel is modeled with a neuronal map of weight vectors and a weight vector closest to an incoming pixel, along with its neighborhood, is updated over time.