ABSTRACT

This chapter presents a brief overview of the recent developments in object detection using convolutional neural networks (CNNs) and describes several classical CNN-based detectors. It also presents some performance comparison results of different models on several benchmark datasets. With more and more powerful computing ability available, CNN layers are becoming deeper and deeper. The chapter introduces the region-based CNN (R-CNN), fast R-CNN, and Faster R-CNN models, and discusses the improvements of each model. The R-CNN model utilizes selective searc, which takes the entire image as input and generates around 2,000 class-independent region proposals. In theory, R-CNN is able to work with any region proposal methods. Selective search is chosen because it performs well and has been employed by other detection models. Before fast R-CNN, the time taken at the second stage is significantly more than that at the region proposal stage.