What do We Learn by Semantic Scene Understanding for Remote Sensing Imagery in CNN Framework?

Figure 1. Samples with different scene complexity score


Recently, deep convolutional neural network (DCNN) increasingly achieved remarkable success and rapidly developed in the field of natural image recognition. There are more and more researches attempt to apply DCNN to remote sensing imagery understanding, sequentially. However, compared with the natural image, the scale of remote sensing image is larger and the scene with the object it represents are more macroscopic. How to better understand DCNN’s mechanism and make it adapt to remote sensing field is not clear. Inspired by the process of human visual perspective, we combing the depth and receptive field with scene complexity to explore what roles they play in remote sensing recognition task. Experiments show that remote sensing scene understanding depends on specify net-depth and net-receptive field for its scene complexity. Using a visualization method, we qualitatively and quantitatively analyze the recognition mechanism and demonstrate the importance of multi-objective joint semantic support in a complex remote sensing scene.

Construction of complexity dataset

Based on existing remote sensing dataset, we select AID and then sort 22 categories of scene, in which complexity is more distinguishable as our basic dataset. The dataset contains 360 samples per class; each sample is a size of 600×600 RGB image. the dataset is divided into three super-classes: low, moderate, and high complexities. To objectively evaluate the scene complexity of samples, we invited 10 volunteers to select 10 to 15 samples randomly and score 10−1 (Figure 1) according to the level of complexity. In this paper, we defined the score range from 1 to 4 as low complexity, 4 to 7 as moderate complexity, and 8 to 10 as high complexity and then add up the final score to evaluate the grade of the complex scene.