深度学习在遥感中的应用综述

一般编程问题

下载此实例

开发语言：Others
实例大小：9.85M
下载次数：6
浏览次数：95
发布时间：2020-08-31
实例类别：一般编程问题
发布人：robot666
文件格式：.pdf
所需积分：2

网友评论举报投诉收藏该页

下载此实例

实例介绍

【实例简介】
深度学习作为一个领域的重大突破已经被证明是一个非常强大的工具在许多领域。我们是否应该把深度学习作为一切的关键?或者，我们应该抵制黑箱解决方案吗?在遥感社区中存在着一些有争议的观点。在本文中，我们分析了对遥感数据分析的深度学习的挑战，回顾了最近的进展，并提供了资源，使遥感的深度学习从一开始就非常简单。更重要的是，我们提倡遥感科学家将他们的专业知识带进深度学习，并将其作为一种隐含的一般模式，以应对气候变化和城市化等前所未有的大规模有影响的挑战。
IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE. IN PRESS step is to develop novel architectures for the matching of images taken from different perspectives and even different imaging modality, preferably without requiring an existing 3D model. Also, besides conventional decision fusion, an alternative is to investigate the transferability of trained networks to other imaging modalities Remote sensing data are geo-located, i.e., they are naturally located in the geographical space. Each pixel corresponds to a spatial coordinate, which facilitates the fusion of pixel information with other sources of data, such as Gis layers, geo-tagged images from social media, or simply other sensors(as above). On one hand, this fact allows tackling of data fusion with non-traditional data modalities while, on the other hand, it opens the field to new applications, such as pictures localization, location-based services or reality augmentation Remote sensing data are geodetic measurements with controlled quality. This enables us to retrieve geo-parameters with confidence estimates. However, differently from purely data-driven approaches, the role of prior knowledge about the sensors adequacy and data quality becomes even more crucial. For example, to retrieve topographic information, even at the same spatial resolution, interferograms acquired using single-pass SAR system are considered to be more important than the ones acquired in repeat-pass manner The time variable is becoming increasingly in the field. The Copernicus program guarantees continuous data acquisition for decades. For instances, Sentinel-I images the entire Earth every six days. This capability is triggering a shift from individual image analysis to time- series processing. Novel network architectures must be developed for optimally exploiting the temporal information jointly with the spatial and spectral information of these data Remote sensing also faces the big data challenge. In the Copernicus era, we are dealing with very large and ever-growing data volumes, and often on a global scale. For example, even if they were launched in 2014, Sentinel satellites have already acquired about 25 Peta Bytes of data. The Copernicus concept calls for global applications, i.e., algorithms must be fast enough and sufficiently transferrable to be applied for the whole Earth surface. On the other hand, these data are well annotated and contain plenty of metadata. hence, in some cases, large training data sets might be generated(semi-)automatically In many cases remote sensing aims at retrieving geo-physical or bio-chemical quantities rather than detecting or classifying objects. These quantities include mass movement rates, mineral composition of soils, water constituents, atmospheric trace gas concentrations, and terrain elevation of biomass. Often process models and expert knowledge exist that is DRAFT IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE IN PRESS traditionally used as priors for the estimates. This particularity suggests that the so-far dogma of expert -free fully automated deep learning should be questioned for remote sensing and physical models should be re-introduced into the concept, as, for example, in the concept of emulators [121 Remote sensing scientists have exploited the power of deep learning to tackle these different challenges and started a new wave of promising research. In this paper, we review these advances After the introductory Section Ii detailing deep learning models(with emphasis put on convolu- tional neural networks), we enter sections dedicated to advances in hyperspectral image analysis (Section Ill-A), synthetic aperture radar(Section Ill-B), very high resolution(Section III-C, data fusion (Section IlI-D), and 3D reconstruction (Section IIl-E) Section IV then provides the tools of the trade for scientists willing to explore deep learning in their research, including open codes and data repositories. Section V concludes the paper by giving an overview of the challenges ahead IL. FROM PERCEPTRON TO DEEP LEARNING Perceptron is the basic of the earliest NNs [13]. It is a bio-inspired model for binary classifi cation that aims to mathematically formalize how a biological neuron works. In contrast, deep learning has provided more sophisticated methodologies to train deep nn architectures. In this section, we recall the classic deep learning architectures used in visual data processing A. Autoencoder models 1)Autoencoder and Stacked Autoencoder (SAE): An autoencoder [14] takes an input e E RD and, first, maps it to a latent representation h E RM via a nonlinear mappin h=f(6x+) (1) where o is a weight matrix to be estimated during training, B is a bias vector, and stands for a nonlinear function, such as the logistic sigmoid function or a hyperbolic tangent function. The encoded feature representation h is then used to reconstruct the input a by a reverse mapping leading to the reconstructed input y y=f(O'h+B,) where e is usually constrained to be the form of o=e,1.e, the same weight is used for encoding the input and decoding the latent representation. The reconstruction error is defined DRAFT IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE IN PRESS as the euclidian distance between a and y that is constrained to approximate the input data a (i.e, making lla-9112>0). The parameters of the autoencoder are generally optimized by stochastic gradient descent(SGD) An Sae is a neural network consisting of multiple layers of autoencoders in which the outputs of each layer are wired to the inputs of the following one 2)Sparse autoencoder: The conventional autoencoder relies on the dimension of the latent representation h being smaller than that of input ie,M< D, which means that it tends to learn a low-dimensional, compressed representation. However, when M>D, one can still discover interesting structures by enforcing a sparsity constraint on the hidden units. Formally, given a set of unlabeled data X= fa. al,..., a, training a sparse autoencoder [15] boils down to finding the optimal parameters by minimizing the following loss function E=∑(J(m,y6,)+A∑KL(川) where J(a, y: O, 3) is an average sum-of-squares error term, which represents the reconstruc tion error between the input a' and its reconstruction y. KL(PllPi) is the Kullback-Leibler(KL) divergence between a Bernoulli random variable with mean p and a bernoulli random variable with mean P;. KL-divergence is a standard function for measuring how similar two distributions KL(pPi)=plog +(1-p)log In the sparse autoencoder model, the Kl-divergence is a sparsity penalty term, and X controls its importance. p is a free parameter corresponding to a desired average activation value, and p indicates the average activation value of hidden neuron h, over the training samples similar to the autoencoder, the optimization of a sparse autoencoder can be achieved via back-propagation and sgd 3)Restricted Boltzmann Machine(rBm)& Deep belief Network (DBN): Unlike the deter ministic network architectures, such as autoencoders or sparse autoencoders, an rbm(cf. fig. 1) is a stochastic undirected graphical model consisting of a visible layer and a hidden layer, and An activation corresponds to how much a region of the image reacts when convolved with a filter. In the first layer, for example, each location in the image receives a value that corresponds to a linear combination of the original bands and the filter applied. The higher such value, the more activated' this filter is on that region. When convolved over the whole image, a filter produces an activation map, which is the activation at each location where the filter has been applied October 12. 2017 DRAFT IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE IN PRESS 6 KJ(,O>B)p(x, 1=1/Z exp(E(x, h ●●●●h f(x;Q,β X Auto-encoder RBM Fig. 1. Schematic comparison of an autoencoder (left) versus a restricted boltzmann Machine (right) has symmetric connections between these two layers No connecting exists within the hidden layer or the input layer. The energy function of an RBM can be defined as follows E(a, h)=xx(hw +b7h) (5 where w, c, and b are learnable weights. here the input is also named as the visible random variable, which is denoted as v in [16]. The joint probability distribution of the rBm is defined (a, h )=z exp(E(a, h )) where Z is a normalization constant. The form of the rbM makes the conditional probability distribution computationally feasible, when a or h are fixed The feature representation ability of a single rbm is limited. However, its real power emerges when a couple of rBMs are stacked, forming a dbn [16]. hinton et al. [16] proposed a greedy approach that trains rBm in each layer to efficiently train the whole DBN B. Convolutional neural networks( CNNS) Unsupervised deep neural networks have been under the spot in the recent year. The leading model is the convolutional neural network(CNN), which learns the filters performing convolu- tions in the image domain. Here, we briefly review some successful Cnn architectures proposed in computer vision in the recent years. For a comprehensive introduction on CNNs, we invite the reader to consider the excellent book by Goodfellow and colleagues [17] DRAFT IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE IN PRESS 7 3 192 2048 2048 de 27128 13 224 dense dens 1000 192 192 128 Max Stride Max Max pooling 20. of 4 pooling pooling 3 Fig. 2. Architecture of AlexNet, as shown in [2] 1)AlexNet: In 2012, Krizhevsky et al [2] created AlexNet, which is a large, deep convolu tional neural network,that won the 2012 IlsVRc (ImageNet Large-Scale Visual recognition Challenge). The year 2012 is marked as the first year where a Cnn was used to achieve a top 5 test error rate of 15.49 AlexNet (cf Fig. 2)scaled the insights of LeNet [18] into a deeper and much larger network that could be used to learn the appearance of more numerous and more complicated objects The contributions of alex net are as follows USing rectified linear units(ReLU) as nonlinearity functions that are capable of decreasing training time, as relu is several times faster than the conventional hyperbolic tangent unction Implementing dropout layers in order to avoid the problem of overfitting Using data augmentation techniques to artificially increase the size of the training set(and see a more diverse set of situations ) From this, the training patches are translated and reflected on the horizontal and vertical axes One of the keys of the success of alexNet is that the model was trained on GPUs. Since GPUs can offer a much larger number of cores than CPUs, it allows much faster training, which in turn allows one to use larger datasets and bigger images 2)VGG Net: The design philosophy of the VGG Nets [3] is simplicity and depth. In 2014, Simonyan and zisserman created vgg nets that strictly makes use of 3 x3 filters with stride and padding of 1, along with 2 X 2 max-pooling layers with stride 2. The main points of VGG Nets are that they October 12. 2017 DRAFT IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE IN PRESS Use filters with small receptive field of 3 x 3, rather than using larger ones(5 X5 or 7X7 as in alexnet Have the same feature map size and number of filters in each convolutional layer of the same block Increase the size of the features in the deeper layers, roughly doubling after each max pooling layer Use scale jittering as one data augmentation technique during training yGG is one of the most influential cnn models. as it reinforces the notion that cnns with deeper architectures can promote hierarchical feature representations of visual data, which in turn improves the classification accuracy. a drawback is that, to train such a model from scratch, one would nced large computational power and a very large labeled training sct 3)ResNet: He et al.[4] pushed the idea of very deep networks even further by proposing the 152-layers ResNet- which won ILSVRC 2015 with an error rate of 3.6%o and set new records in classification, detection, and localization through a single network architecture In 44, authors provide an in-depth analysis about the degradation problem, 1. e, simply increasing the number of layers in plain networks results in higher training and test errors, and claim that it is easier to optimize the residual mapping in the resNet than to optimize the original, unreferenced mapping in the conventional CNNs. The core idea of ResNet is to add shortcut connections that by-pass two or more stacked convolutional layers by performing identity mapping, which are then added together with the output of stacked convolutions 4) FCN: The fully convolutional network(FCN[7 is the most important work in deep learning for semantic segmentation, which is the task of assigning a semantic label to every pixel in the image. To perform this task, the output of the Cnn must be of the same pixels size as the input (contrarily to the 'single class per image,of the aforementioned models). FCN introduces many significant ideas End-to-end learning of the upsampling algorithm via an encoder/decoder structure that first downsamples the activations size and then upsamples it again Using fully convolutional architecture allows the network to take images of arbitrary size as input since there is no fully connected layer at the end that requires a specific size of the activations Introducing skip connections as a way of fusing information from different depths in the network for the multi-scale inference October 12. 2017 DRAFT IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE IN PRESS forward/inference backward/learning 09621 96 21 Fig. 3. Architecture of FCN [7] Fig. 3 shows the architecture of FCN III. REMOTE SENSING MEETS DEEP LEARNING Deep learning is taking off in remote sensing, as shown in Fig. 4, which summarizes the number of papers on the topic since 2014. Their exponential increase confirms the rapid surge of interest in deep learning for remote sensing. In this section, we focus on a variety of remote sensing applications that are achieved by deep learning and provide an in-depth investigation from the perspectives of hyperspectral image analysis, interpretation of SAR images, interpretation of high-resolution satellite images, multimodal data fusion and 3d reconstruction A perspectral Image Analysis yperspectral sensors are characterized by hundreds of narrow spectral bands. This very high spectral resolution enables us to identify the materials contained in the pixel via spectroscopic analysis. Analysis of hyperspectral data is of high importance in many practical applications such as land cover/use classification or change and object detection. Also, because high quality hyperspectral satellite data is becoming available, e.g, via the launch of EnmaP, planned in October 12. 2017 DRAFT IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE IN PRESS 10 78 71 23 3 2014 2015 2016 2017 publication years Fig. 4. Statistics on papers related to deep learning in remole sensing. [ source: ISI web of Science, status: Sepleinber 20171 2020, and DESIS, planned in 2017, hyperspectral image analysis has been one of the most active research directions in the remote sensing community over the last decade Inspired by the success of deep learning in computer vision, preliminary studies have been carried out on deep learning in hyperspectral data analysis, which brings new momentum into this field. In this section, we would like to review two application cases, namely, land cover/use classification(IlI-Al) and anomaly detection (III-A2) -Pooling>> pooling Logist Label Neighbourhood of the Hyperspectral inage Fig. 5. Flowchart of the 3D CNn architecture proposed in [19] for spectral-spatial hyperspectral image classification 1)hyperspectral Image Classification: Supervised classification is probably the most research area in hyperspectral data analysis. There is a vast literature on this topic using the con ventional supervised machine learning models, such as decision trees, random forests, and support vector machines(SVMs)[20]. with the investigation of hyperspectral image classification [211 October 12. 2017 DRAFT 【实例截图】
【核心代码】

标签：

实例下载地址