Recent Advances in Video Captioning with Object Detection

doi:10.1201/9781003393658-1

ABSTRACT

Object detection, a primary area of computer vision, has tremendously boosted other computer vision tasks ranging from fine-grained classification to captioning. Post Deep learning object detection methodology can be broadly segregated into two types: (i) Two-stage region proposal-based methods and (ii) Single-stage regression-based methods. In this chapter, we first overview both types of object detection methodology. However, our primary focus lies in the second part, which describes the advancements in the video captioning task due to improved object detectors.