ABSTRACT

During past years, the need for security-oriented surveillance systems has grown larger and larger. Nowadays many public environments, such as airports, train stations, etc., are monitored by some sort of video-surveillance system in order to detect or prevent security issues. The involved technology ranges from the use of plain closed-circuit cameras (CCTV) to sophisticated computer-based video processing systems. The CCTV approach has been the only feasible choice in the past, and it is still widely used; however, its limits are becoming more and more evident: the increase in the number of sensors (modern surveillance systems can use hundreds of cameras) is often not matched by an adequate number of human operators, whose attention

for Video

is spread over many different tasks and quickly decreases over time. Modern computer-based systems try to face these problems using automatic video analysis and understanding techniques, in order to highlight only the potential security issues and thus requiring the attention of a human operator only in a limited number of cases. The research in this field has been very active and has produced many techniques for video analysis and interpretation, yet many works are limited to the use of static cameras. Only recently has the research community started focusing on more sophisticated sensors like Pan-Tilt-Zoom (PTZ) cameras, and very few have considered the advantages of using heterogeneous sensors, such as cooperation of audio and video sensors. In particular, audio microphones have several advantages, like wide omnidirectional coverage and low price, and thus could be particularly useful in covering large environments. Audio data is, of course, less discriminative than cameras for a human operator required to classify the detected activities; this is why the simultaneous, coordinated use of both audio and video sensors could lead to a real advancement in the field of surveillance systems. This work proposes a novel audio-and video-based surveillance system where audio sensors are used to cover large portions of the environment, and the detected audio sources are further analyzed by means of PTZ cameras.