ABSTRACT

This chapter presents an approach to perform semantic segmentation on very high resolution remote sensing images using a deep convolutional neural network. It discusses a patch based classification approach, that is suited to sparsely annotated data. However, this kind of approach is limited in term of semantic precision in spatial location. The chapter describes one of the first popular architecture based on Fully Convolutional Neural Networks, the U-Net. It is an encoder-decoder architecture. The encoder reduces progressively the spatial resolution with strided convolution or pooling layers, increasing the features dimension, and the decoder progressively recovers spatial resolution in decreasing features dimension. Finally, the last feature map is the predicted class.