ABSTRACT

Because of the rapid growth of large-scale video recording and sharing, there is a growing need for robust and scalable solutions for analyzing video content. The ability to detect and recognize video events that capture real-world activities is one of the key and complex problems. This chapter aims at the development of robust and efficient solutions for large-scale video event detection systems using a state-of-the-art deep learning approach. In particular, we investigate event detection with automatically discovered event-specific concepts with organized ontology. Specifically, we built a large-scale event-specific concept library named EventNet that covers as many real-world events and their concepts as possible. After an automatic filter process, we end up with 95,321 videos and 4, 490 concepts in 500 event categories. A deep neural network has been trained on top of the 500 event categories. To the best of our knowledge, EventNet represents the first video event ontology that organizes events and their concepts into a semantic structure. It offers great potential for event retrieval and browsing. The EventNet system is the first in allowing users to explore rich hierarchical structures among video events, relations between concepts and events, and automatic detection of events and concepts embedded in user-uploaded videos in a live fashion.