ABSTRACT

Contents 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.1 Local Spatiotemporal Features for Video CBCD . . . . . . . . . . . . . . . . . . . 6 1.1.2 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Efficient Local Spatiotemporal Interest Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1 Short Overview of Spatiotemporal Interest Points . . . . . . . . . . . . . . . . . . 7 1.2.2 Hessian-Based STIP Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.3 Hessian-Based STIP Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.4 Adaptations for Video CBCD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 Disk-Based Pipeline for Efficient Indexing and Retrieval . . . . . . . . . . . . . . . . . . . 10 1.3.1 Pipeline Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.2 Indexing and Retrieving High-Dimensional Data . . . . . . . . . . . . . . . . . . 11 1.3.3 Disk-Based p-Stable LSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.1 Setting Up the Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.2 Comparison between Spatial and Spatiotemporal Features . . . . . . . . . 14 1.4.3 Robustness to Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.4 Video Copy Detection from Handheld Shooting. . . . . . . . . . . . . . . . . . . 17

1.5 Application: Monitoring Commercial Airtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Detecting (near) copies of video footage in streams or databases has several useful applications. Reuse of video material by different data aggregators can help as a guide to link stories across channels, while finding copies of copyrighted video fragments is of great interest for their rightful owners. Most methods today are based on global descriptors and fingerprinting techniques as they can be computed very efficiently. The downside, however, is that they are inherently not very well suited to detect copies after transformations have been applied that change the overall appearance, such as the addition of logos and banners, picture-in-picture, cropping, etc. To cope with such variations, methods based on local features have been proposed. Most such robust content-based copy detection (CBCD) systems proposed to date have in common that they initially rely on the detection of twodimensional (2D) interest points on specific or all frames and only use temporal information in a second step (if at all). Here we propose the use of a recently developed spatiotemporal feature detector and descriptor based on the Hessian matrix. We explain the theory behind the features as well as the implementation details that allow for efficient computations. We further present a disk-based pipeline for efficient indexing and retrieval of video content based on two-stable locality sensitive hashing (LSH). Through experiments, we show the increased robustness when using spatiotemporal features as opposed to purely spatial features. Finally, we apply the proposed system to faster than real-time monitoring of 71 commercials in 5 h of TV broadcasting.