ABSTRACT

With the ubiquitous presence of video data and the increasing importance of real-world applications such as visual surveillance, it is becoming increasingly necessary to automatically analyze and understand human motions from large amounts of video data. Machine learning for vision-based motion analysis is the research field that tries to bring together aspects from motion

for Video

analysis such as detection, tracking, and object identification with statistical machine learning techniques. In this chapter we address the problem of silhouette-based human action modeling and recognition independently of the camera point of view. The ability to do so is crucial for deployment in practical CCTV systems where it would be impossible to train each camera to recognize actions for its particular point of view and where many cameras are of the moving pan-tilt-zoom (PTZ) type, that is, can be moved under either computer or human control. Basically, there are two main approaches for human pose modeling: model based top-down and model-free bottom-up strategies. Model-based approaches presuppose the use of an explicit model of a person and basically match a projection of the human body with the image observation. Bottom-up methods do not use such an explicit representation, but directly infer human pose/action from the image features previously extracted. Essentially, an example-based method or a learning-based approach is followed from a dataset of exemplars.