Geoinformatics Unit



Current Position

Naoto Yokoya is a lecturer at the University of Tokyo and the unit leader at Geoinformatics Unit, the RIKEN Center for Advanced Intelligence Project (AIP), Japan. His research interests include image processing, data fusion, and machine learning for understanding remote sensing images, with applications to disaster management and environmental monitoring.
He is a member of the IEEE (2009), IEEE Geoscience and Remote Sensing Society (GRSS), and IEEE GRSS Image Analysis and Data Fusion Technical Committee (IADF TC).
He is an associate editor of IEEE Transactions on Geoscience and Remote Sensing (TGRS). He is an organizer of CVPR EarthVision workshop and IJCAI CDCEO workshop.


2020 May - Present    Lecturer, The University of Tokyo, Japan
2019 Apr - 2020 Mar    Visiting Associate Professor, Tokyo University of Agriculture and Technology, Japan.
2018 Jan - Present    Unit Leader, RIKEN AIP, Japan
2015 Dec - 2017 Nov    Alexander von Humboldt Research Fellow, DLR & TUM, Germany
2013 Jul - 2017 Dec    Assistant Professor, The University of Tokyo, Japan
2010 Oct - 2013 Mar    Ph.D. in Aerospace Engineering, The University of Tokyo, Japan

Journal Papers

  1. W. He, N. Yokoya, X. Yuan, " Fast hyperspectral image recovery via non-iterative fusion of dual-camera compressive hyperspectral imaging ," IEEE Transactions on Image Processing (accepted for publication), 2021.
    PDF    Quick Abstract

    Abstract: Coded aperture snapshot spectral imaging (CASSI) is a promising technique to capture the three-dimensional hyperspectral image (HSI) using a single coded two-dimensional (2D) measurement, in which algorithms are used to perform the inverse problem. Due to the ill-posed nature, various regularizers have been exploited to reconstruct the 3D data from the 2D measurement. Unfortunately, the accuracy and computational complexity are unsatisfied. One feasible solution is to utilize additional information such as the RGB measurement in CASSI. Considering the combined CASSI and RGB measurement, in this paper, we propose a new fusion model for the HSI reconstruction. We investigate the spectral low-rank property of HSI composed of a spectral basis and spatial coefficients. Specifically, the RGB measurement is utilized to estimate the coefficients, meanwhile, the CASSI measurement is adopted to provide the orthogonal spectral basis. We further propose a patch processing strategy to enhance the spectral low-rank property of HSI. The proposed model neither requires non-local processing or iteration, nor the spectral sensing matrix of the RGB detector. Extensive experiments on both simulated and real HSI datasets demonstrate that our proposed method outperforms previous state-of-the-art not only in quality but also speeds up the reconstruction more than 5000 times.

  2. J. Xia, N. Yokoya, B. Adriano, L. Zhang, G. Li, Z. Wang, " A benchmark high-resolution GaoFen-3 SAR dataset for building semantic segmentation ," IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., 2021.
    PDF    Quick Abstract

    Abstract: Deep learning is increasingly popular in remote sensing communities and already successful in land cover classification and semantic segmentation. However, most studies are limited to the utilization of optical datasets. Despite few attempts applied to synthetic aperture radar (SAR) using deep learning, the huge potential, especially for the very high-resolution SAR, are still underexploited. Taking building segmentation as an example, the very high resolution (VHR) SAR datasets are still missing to the best of our knowledge. A comparable baseline for SAR building segmentation does not exist, and which segmentation method is more suitable for SAR image is poorly understood. This paper first provides a benchmark high-resolution (1 m) GaoFen-3 SAR datasets, which cover nine cities from seven countries, review the state-of-the-art semantic segmentation methods applied to SAR, and then summarize the potential operations to improve the performance. With these comprehensive assessments, we hope to provide the recommendation and roadmap for future SAR semantic segmentation.

  3. D. Hong, L. Gao, J. Yao, N. Yokoya, J. Chanussot, U. Heiden, B. Zhang, " Endmember-guided unmixing network (EGU-Net): A general deep learning framework for self-supervised hyperspectral unmixing ," IEEE Transactions on Neural Networks and Learning Systems, 2021.
    Quick Abstract

    Abstract: Over the past decades, enormous efforts have been made to improve the performance of linear or nonlinear mixing models for hyperspectral unmixing, yet their ability to simultaneously generalize various spectral variabilities and extract physically meaningful endmembers still remains limited due to the poor ability in data fitting and reconstruction and the sensitivity to various spectral variabilities. Inspired by the powerful learning ability of deep learning, we attempt to develop a general deep learning approach for hyperspectral unmixing, by fully considering the properties of endmembers extracted from the hyperspectral imagery, called endmember-guided unmixing network (EGU-Net). Beyond the alone autoencoder-like architecture, EGU-Net is a two-stream Siamese deep network, which learns an additional network from the pure or nearly-pure endmembers to correct the weights of another unmixing network by sharing network parameters and adding spectrally meaningful constraints (e.g., non-negativity and sum-to-one) towards a more accurate and interpretable unmixing solution. Furthermore, the resulting general framework is not only limited to pixel-wise spectral unmixing but also applicable to spatial information modeling with convolutional operators for spatial-spectral unmixing. Experimental results conducted on three different datasets with the ground-truth of abundance maps corresponding to each material demonstrate the effectiveness and superiority of the EGU-Net over state-of-the-art unmixing algorithms. The codes will be available from the website:

  4. Y. Qu, H. Qi, C. Kwan, N. Yokoya, J. Chanussot, " Unsupervised and unregistered hyperspectral image super-resolution with mutual dirichlet-net ," IEEE Transactions on Geoscience and Remote Sensing, pp. 1-18, 2021.
    Quick Abstract

    Abstract: Hyperspectral images (HSI) provide rich spectral information that has contributed to the successful performance improvement of numerous computer vision and remote sensing tasks. However, it can only be achieved at the expense of images’ spatial resolution. Hyperspectral image super-resolution (HSI- SR) thus addresses this problem by fusing low resolution (LR) HSI with multispectral image (MSI) carrying much higher spatial resolution (HR). Existing HSI-SR approaches require the LR HSI and HR MSI to be well registered and the reconstruction accuracy of the HR HSI relies heavily on the registration accuracy of different modalities. In this paper, we propose an unregistered and unsupervised mutual Dirichlet-Net (u2-MDN) to exploit the uncharted problem domain of HSI-SR without the requirement of multi-modality registration. The success of this endeavor would largely facilitate the deployment of HSI-SR since registration requirement is difficult to satisfy in real-world sensing devices. The novelty of this work is three-fold. First, to stabilize the fusion procedure of two unregistered modalities, the network is designed to extract spatial and spectral information of two modalities with different dimensions through a shared encoder-decoder structure. Second, the mutual information (MI) is further adopted to capture the non-linear statistical dependencies between the representations from two modalities (carrying spatial information) and their raw inputs. By maximizing the MI, spatial correlations between different modalities can be well characterized to further reduce the spectral distortion. We assume the representations follow a similar Dirichlet distribution for its inherent sum-to-one and non-negative properties. Third, a collaborative l2,1 norm is employed as the reconstruction error instead of the more common l2 norm to better preserve the spectral information. Extensive experimental results demonstrate the superior performance of u2-MDN as compared to the state- of-the-art.

  5. G. Baier, A. Deschemps, M. Schmitt, and N. Yokoya, " Synthesizing optical and SAR imagery from land cover maps and auxiliary raster data ," IEEE Transactions on Geoscience and Remote Sensing, 2021.
    PDF    Quick Abstract

    Abstract: We synthesize both optical RGB and SAR remote sensing images from land cover maps and auxiliary raster data using GANs. In remote sensing many types of data, such as digital elevation models or precipitation maps, are often not reflected in land cover maps but still influence image content or structure. Including such data in the synthesis process increases the quality of the generated images and exerts more control on their characteristics. Spatially adaptive normalization layers fuse both inputs and are applied to a full-blown generator architecture consisting of encoder and decoder, to take full advantage of the information content in the auxiliary raster data. Our method successfully synthesizes medium (10m) and high (1m) resolution images, when trained with the corresponding dataset. We show the advantage of data fusion of land cover maps and auxiliary information using mean intersection over union, pixel accuracy and Fréchet inception distance using pre-trained U-Net segmentation models. Handpicked images exemplify how fusing information avoids ambiguities in the synthesized images. By slightly editing the input our method can be used to synthesize realistic changes, i.e., raising the water levels. The source code is available at and we published the newly created high-resolution dataset at

  6. C. Robinson, K. Malkin, N. Jojic, H. Chen, R. Qin, C. Xiao, M. Schmitt, P. Ghamisi, R. Haensch, and N. Yokoya, " Global land cover mapping with weak supervision: Outcome of the 2020 IEEE GRSS Data Fusion Contest ," IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., 2021.
    PDF    Quick Abstract

    Abstract: This paper presents the scientific outcomes of the 2020 Data Fusion Contest organized by the Image Analysis and Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society. The 2020 Contest addressed the problem of automatic global land-cover mapping with weak supervision, i.e. estimating high-resolution semantic maps while only low-resolution reference data is available during training. Two separate competitions were organized to assess two different scenarios: 1) high-resolution labels are not available at all and 2) a small amount of high-resolution labels are available additionally to low-resolution reference data. In this paper we describe the DFC2020 dataset that remains available for further evaluation of corresponding approaches and report the results of the best-performing methods during the contest.

  7. B. Adriano, N. Yokoya, J. Xia, H. Miura, W. Liu, M. Matsuoka, S. Koshimura, " Learning from multimodal and multitemporal earth observation data for building damage mapping ," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 175, pp. 132-143, 2021.
    PDF    Quick Abstract

    Abstract: Earth observation (EO) technologies, such as optical imaging and synthetic aperture radar (SAR), provide excellent means to continuously monitor ever-growing urban environments. Notably, in the case of large-scale disasters (e.g., tsunamis and earthquakes), in which a response is highly time-critical, images from both data modalities can complement each other to accurately convey the full damage condition in the disaster aftermath. However, due to several factors, such as weather and satellite coverage, which data modality will be the first available for rapid disaster response efforts is often uncertain. Hence, novel methodologies that can utilize all accessible EO datasets are essential for disaster management. In this study, we developed a global multimodal and multitemporal dataset for building damage mapping. We included building damage characteristics from three disaster types, namely, earthquakes, tsunamis, and typhoons, and considered three building damage categories. The global dataset contains high-resolution (HR) optical imagery and high-to-moderate-resolution SAR data acquired before and after each disaster. Using this comprehensive dataset, we analyzed five data modality scenarios for damage mapping: single-mode (optical and SAR datasets), cross-modal (pre-disaster optical and post-disaster SAR datasets), and mode fusion scenarios. We defined a damage mapping framework for semantic segmentation of damaged buildings based on a deep convolutional neural network (CNN) algorithm. We also compared our approach to another state-of-the-art model for damage mapping. The results indicated that our dataset, together with a deep learning network, enabled acceptable predictions for all the data modality scenarios. We also found that the results from cross-modal mapping were comparable to the results obtained from a fusion sensor and optical mode analysis.

  8. D. Hong, W. He, N. Yokoya, J. Yao, L. Gao, L. Zhang, J. Chanussot, and X.X. Zhu, " Interpretable hyperspectral AI: When non-convex modeling meets hyperspectral remote sensing ," IEEE Geoscience and Remote Sensing Magazine, 2021.
    PDF    Quick Abstract

    Abstract: Hyperspectral imaging, also known as image spectrometry, is a landmark technique in geoscience and remote sensing (RS). In the past decade, enormous efforts have been made to process and analyze these hyperspectral (HS) products mainly by means of seasoned experts. However, with the ever-growing volume of data, the bulk of costs in manpower and material resources poses new challenges on reducing the burden of manual labor and improving efficiency. For this reason, it is, therefore, urgent to develop more intelligent and automatic approaches for various HS RS applications. Machine learning (ML) tools with convex optimization have successfully undertaken the tasks of numerous artificial intelligence (AI)-related applications. However, their ability in handling complex practical problems remains limited, particularly for HS data, due to the effects of various spectral variabilities in the process of HS imaging and the complexity and redundancy of higher dimensional HS signals. Compared to the convex models, non-convex modeling, which is capable of characterizing more complex real scenes and providing the model interpretability technically and theoretically, has been proven to be a feasible solution to reduce the gap between challenging HS vision tasks and currently advanced intelligent data processing models. This article mainly presents an advanced and cutting-edge technical survey for non-convex modeling towards interpretable AI models covering a board scope in the following topics of HS RS: 1) HS image restoration, 2) dimensionality reduction, 3) data fusion and enhancement, 3) spectral unmixing, 4) cross-modality learning for large-scale land cover mapping. Around these topics, we will showcase the significance of non-convex techniques to bridge the gap between HS RS and interpretable AI models with a brief introduction on the research background and motivation, an emphasis on the resulting methodological foundations and solution, and an intuitive clarification of illustrative examples. At the end of each topic, we also pose the remaining challenges on how to completely model the issues of complex spectral vision from the perspective of intelligent ML combined with physical priors and numerical non-convex modeling, and accordingly point out future research directions. This paper aims to create a good entry point to the advanced literature for experienced researchers, Ph.D. students, and engineers who already have some background knowledge in HS RS, ML, and optimization. This can further help them launch new investigations on the basis of the above topics and interpretable AI techniques for their focused fields.

  9. T. D. Pham, N. Yokoya, T. T. T. Nguyen, N. N. Le, N. T. Ha, J. Xia, W. Takeuchi, T. D. Pham, " Improvement of mangrove soil carbon stocks estimation in North Vietnam using Sentinel-2 data and machine learning approach ," GIScience & Remote Sensing, pp. 68-87, 2021.
    PDF    Quick Abstract

    Abstract: Quantifying total carbon (TC) stocks in soil across various mangrove ecosystems is key to understanding the global carbon cycle to reduce greenhouse gas emissions. Estimating mangrove TC at a large scale remains challenging due to the difficulty and high cost of soil carbon measurements when the number of samples is high. In the present study, we investigated the capability of Sentinel-2 multispectral data together with a state-of-the-art machine learning (ML) technique, which is a combination of CatBoost regression (CBR) and a genetic algorithm (GA) for feature selection and optimization (the CBR-GA model) to estimate the mangrove soil C stocks across the mangrove ecosystems in North Vietnam. We used the field survey data collected from 177 soil cores. We compared the performance of the proposed model with those of the four ML algorithms, i.e., the extreme gradient boosting regression (XGBR), the light gradient boosting machine regression (LGBMR), the support vector regression (SVR), and the random forest regression (RFR) models. Our proposed model estimated the TC level in the soil as 35.06–166.83 Mg ha−1 (average = 92.27 Mg ha−1) with satisfactory accuracy (R 2 = 0.665, RMSE = 18.41 Mg ha−1) and yielded the best prediction performance among all the ML techniques. We conclude that the Sentinel-2 data combined with the CBR-GA model can improve estimates of the mangrove TC at 10 m spatial resolution in tropical areas. The effectiveness of the proposed approach should be further evaluated for different mangrove soils of the other mangrove ecosystems in tropical and semi-tropical regions.

  10. M. Pourshamsi, J. Xia, N. Yokoya, M. Garcia, M. Lavalle, E. Pottier, and H. Balzter, " Tropical forest canopy height estimation from combined polarimetric SAR and LiDAR using machine learning ," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 172, pp. 79-94, 2021.
    PDF    Quick Abstract

    Abstract: Forest height is an important forest biophysical parameter which is used to derive important information about forest ecosystems, such as forest above ground biomass. In this paper, the potential of combining Polarimetric Synthetic Aperture Radar (PolSAR) variables with LiDAR measurements for forest height estimation is investigated. This will be conducted using different machine learning algorithms including Random Forest (RFs), Rotation Forest (RoFs), Canonical Correlation Forest (CCFs) and Support Vector Machine (SVMs). Various PolSAR parameters are required as input variables to ensure a successful height retrieval across different forest heights ranges. The algorithms are trained with 5000 LiDAR samples (less than 1% of the full scene) and different polarimetric variables. To examine the dependency of the algorithm on input training samples, three different subsets are identified which each includes different features: subset 1 is quiet diverse and includes non-vegetated region, short/sparse vegetation (0–20 m), vegetation with mid-range height (20–40 m) to tall/dense ones (40–60 m); subset 2 covers mostly the dense vegetated area with height ranges 40–60 m; and subset 3 mostly covers the non-vegetated to short/sparse vegetation (0–20 m) .The trained algorithms were used to estimate the height for the areas outside the identified subset. The results were validated with independent samples of LiDAR-derived height showing high accuracy (with the average R2 = 0.70 and RMSE = 10 m between all the algorithms and different training samples). The results confirm that it is possible to estimate forest canopy height using PolSAR parameters together with a small coverage of LiDAR height as training data.

  11. N. Yokoya, K. Yamanoi, W. He, G. Baier, B. Adriano, H. Miura, and S. Oishi, " Breaking limits of remote sensing by deep learning from simulated data for flood and debris flow mapping ," IEEE Transactions on Geoscience and Remote Sensing (Early Access), 2021.
    PDF    Quick Abstract

    Abstract: We propose a framework that estimates the inundation depth (maximum water level) and debris-flow-induced topographic deformation from remote sensing imagery by integrating deep learning and numerical simulation. A water and debris-flow simulator generates training data for various artificial disaster scenarios. We show that regression models based on Attention U-Net and LinkNet architectures trained on such synthetic data can predict the maximum water level and topographic deformation from a remote sensing-derived change detection map and a digital elevation model. The proposed framework has an inpainting capability, thus mitigating the false negatives that are inevitable in remote sensing image analysis. Our framework breaks limits of remote sensing and enables rapid estimation of inundation depth and topographic deformation, essential information for emergency response, including rescue and relief activities. We conduct experiments with both synthetic and real data for two disaster events that caused simultaneous flooding and debris flows and demonstrate the effectiveness of our approach quantitatively and qualitatively. Our code and data sets are available at

  12. Y. Lian, T. Feng, J. Zhou, M. Jia, A. Li, Z. Wu, L. Jiao, M. Brown, G. Hager, N. Yokoya, R. Haensch, and B. Le Saux, " Large-Scale Semantic 3D Reconstruction: Outcome of the 2019 IEEE GRSS Data Fusion Contest - Part B ," IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 1158-1170, 2021.
    PDF    Quick Abstract

    Abstract: We present the scientific outcomes of the 2019 Data Fusion Contest organized by the Image Analysis and Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society. The contest included challenges with large-scale datasets for semantic 3-D reconstruction from satellite images and also semantic 3-D point cloud classification from airborne LiDAR. 3-D reconstruction results are discussed separately in Part-A. In this Part-B, we report the results of the two best-performing approaches for 3-D point cloud classification. Both are deep learning methods that improve upon the PointSIFT model with mechanisms to combine multiscale features and task-specific postprocessing to refine model outputs.

  13. S. Kunwar, H. Chen, M. Lin, H. Zhang, P. D'Angelo, D. Cerra, S. M. Azimi, M. Brown, G. Hager, N. Yokoya, R. Haensch, and B. Le Saux, " Large-scale semantic 3D reconstruction: Outcome of the 2019 IEEE GRSS Data Fusion Contest - Part A ," IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 922-935, 2021.
    PDF    Quick Abstract

    Abstract: In this paper, we present the scientific outcomes of the 2019 Data Fusion Contest organized by the Image Analysis and Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society. The 2019 Contest addressed the problem of 3D reconstruction and 3D semantic understanding on a large scale. Several competitions were organized to assess specific issues, such as elevation estimation and semantic mapping from a single view, two views, or multiple views. In this Part A, we report the results of the best-performing approaches for semantic 3D reconstruction according to these various set-ups, while 3D point cloud semantic mapping is discussed in Part B.

  14. J. Xia, N. Yokoya, and T. D. Pham, " Probabilistic mangrove species mapping with multiple-source remote-sensing datasets using label distribution learning in Xuan Thuy National Park, Vietnam ," Remote Sensing, vol. 12, no. 22, pp. 3834, 2020.
    PDF    Quick Abstract

    Abstract: Mangrove forests play an important role in maintaining water quality, mitigating climate change impacts, and providing a wide range of ecosystem services. Effective identification of mangrove species using remote-sensing images remains a challenge. The combinations of multi-source remote-sensing datasets (with different spectral/spatial resolution) are beneficial to the improvement of mangrove tree species discrimination. In this paper, various combinations of remote-sensing datasets including Sentinel-1 dual-polarimetric synthetic aperture radar (SAR), Sentinel-2 multispectral, and Gaofen-3 full-polarimetric SAR data were used to classify the mangrove communities in Xuan Thuy National Park, Vietnam. The mixture of mangrove communities consisting of small and shrub mangrove patches is generally difficult to separate using low/medium spatial resolution. To alleviate this problem, we propose to use label distribution learning (LDL) to provide the probabilistic mapping of tree species, including Sonneratia caseolaris (SC), Kandelia obovata (KO), Aegiceras corniculatum (AC), Rhizophora stylosa (RS), and Avicennia marina (AM). The experimental results show that the best classification performance was achieved by an integration of Sentinel-2 and Gaofen-3 datasets, demonstrating that full-polarimetric Gaofen-3 data is superior to the dual-polarimetric Sentinel-1 data for mapping mangrove tree species in the tropics.

  15. W. He, Q. Yao, C. Li, N. Yokoya, Q. Zhao, H. Zhang, and L. Zhang, " Non-local meets global: An iterative paradigm for hyperspectral image restoration ," IEEE Transactions on Pattern Analysis and Machine Intelligence (early access), 2020.
    Quick Abstract

    Abstract: Non-local low-rank tensor approximation has been developed as a state-of-the-art method for hyperspectral image (HSI) restoration, which includes the tasks of denoising, compressed HSI reconstruction and inpainting. Unfortunately, while its restoration performance benefits from more spectral bands, its runtime also substantially increases. In this paper, we claim that the HSI lies in a global spectral low-rank subspace, and the spectral subspaces of each full band patch group should lie in this global low-rank subspace. This motivates us to propose a unified paradigm combining the spatial and spectral properties for HSI restoration. The proposed paradigm enjoys performance superiority from the non-local spatial denoising and light computation complexity from the low-rank orthogonal basis exploration. An efficient alternating minimization algorithm with rank adaptation is developed. It is done by first solving a fidelity term-related problem for the update of a latent input image, and then learning a low-dimensional orthogonal basis and the related reduced image from the latent input image. Subsequently, non-local low-rank denoising is developed to refine the reduced image and orthogonal basis iteratively. Finally, the experiments on HSI denoising, compressed reconstruction, and inpainting tasks, with both simulated and real datasets, demonstrate its superiority with respect to state-of-the-art HSI restoration methods.

  16. D. Hong, J. Kang, N. Yokoya, and J. Chanussot, " Graph-induced aligned learning on subspaces for hyperspectral and multispectral data ," IEEE Transactions on Geoscience and Remote Sensing, (early access), 2020.
    Quick Abstract

    Abstract: In this article, we have great interest in investigating a common but practical issue in remote sensing (RS)--can a limited amount of one information-rich (or high-quality) data, e.g., hyperspectral (HS) image, improve the performance of a classification task using a large amount of another information-poor (low-quality) data, e.g., multispectral (MS) image? This question leads to a typical cross-modality feature learning. However, classic cross-modality representation learning approaches, e.g., manifold alignment, remain limited in effectively and efficiently handling such problems that the data from high-quality modality are largely absent. For this reason, we propose a novel graph-induced aligned learning (GiAL) framework by 1) adaptively learning a unified graph (further yielding a Laplacian matrix) from the data in order to align multimodality data (MS-HS data) into a latent shared subspace; 2) simultaneously modeling two regression behaviors with respect to labels and pseudo-labels under a multitask learning paradigm; and 3) dramatically updating the pseudo-labels according to the learned graph and refeeding the latest pseudo-labels into model learning of the next round. In addition, an optimization framework based on the alternating direction method of multipliers (ADMMs) is devised to solve the proposed GiAL model. Extensive experiments are conducted on two MS-HS RS data sets, demonstrating the superiority of the proposed GiAL compared with several state-of-the-art methods.

  17. D. Hong, N. Yokoya, J. Chanussot, J. Xu, and X. X. Zhu, " Joint and progressive subspace analysis (JPSA) with spatial-spectral manifold alignment for semi-supervised hyperspectral dimensionality reduction ," IEEE Transactions on Cybernetics (accepted for publication), 2020.
    Quick Abstract

    Abstract: Conventional nonlinear subspace learning techniques (e.g., manifold learning) usually introduce some drawbacks in explainability (explicit mapping) and cost-effectiveness (linearization), generalization capability (out-of-sample), and representability (spatial-spectral discrimination). To overcome these shortcomings, a novel linearized subspace analysis technique with spatial-spectral manifold alignment is developed for a semi-supervised hyperspectral dimensionality reduction (HDR), called joint and progressive subspace analysis (JPSA). The JPSA learns a high-level, semantically meaningful, joint spatial-spectral feature representation from hyperspectral data by 1) jointly learning latent subspaces and a linear classifier to find an effective projection direction favorable for classification; 2) progressively searching several intermediate states of subspaces to approach an optimal mapping from the original space to a potential more discriminative subspace; 3) spatially and spectrally aligning manifold structure in each learned latent subspace in order to preserve the same or similar topological property between the compressed data and the original data. A simple but effective classifier, i.e., nearest neighbor (NN), is explored as a potential application for validating the algorithm performance of different HDR approaches. Extensive experiments are conducted to demonstrate the superiority and effectiveness of the proposed JPSA on two widely-used hyperspectral datasets: Indian Pines (92.98\%) and the University of Houston (86.09%) in comparison with previous state-of-the-art HDR methods. The demo of this basic work (i.e., ECCV2018) is openly available at

  18. D. Hong, L. Gao, N. Yokoya, J. Yao, J. Chanussot, Q. Du, and B. Zhang, " More diverse means better: Multimodal deep learning meets remote-sensing imagery classification ," IEEE Transactions on Geoscience and Remote Sensing (Early Access), 2020.
    Quick Abstract

    Abstract: Classification and identification of the materials lying over or beneath the Earth's surface have long been a fundamental but challenging research topic in geoscience and remote sensing (RS) and have garnered a growing concern owing to the recent advancements of deep learning techniques. Although deep networks have been successfully applied in single-modality-dominated classification tasks, yet their performance inevitably meets the bottleneck in complex scenes that need to be finely classified, due to the limitation of information diversity. In this work, we provide a baseline solution to the aforementioned difficulty by developing a general multimodal deep learning (MDL) framework. In particular, we also investigate a special case of multi-modality learning (MML) -- cross-modality learning (CML) that exists widely in RS image classification applications. By focusing on "what", "where", and "how" to fuse, we show different fusion strategies as well as how to train deep networks and build the network architecture. Specifically, five fusion architectures are introduced and developed, further being unified in our MDL framework. More significantly, our framework is not only limited to pixel-wise classification tasks but also applicable to spatial information modeling with convolutional neural networks (CNNs). To validate the effectiveness and superiority of the MDL framework, extensive experiments related to the settings of MML and CML are conducted on two different multimodal RS datasets. Furthermore, the codes and datasets will be available at:, contributing to the RS community.

  19. D. Hong, N. Yokoya, G.-S. Xia, J. Chanussot, X. X. Zhu, " X-ModalNet: A Semi-Supervised Deep Cross-Modal Network for Classification of Remote Sensing Data ," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 167, pp. 12-23, 2020.
    Quick Abstract

    Abstract: This paper addresses the problem of semi-supervised transfer learning with limited cross-modality data in remote sensing. A large amount of multi-modal earth observation images, such as multispectral imagery (MSI) or synthetic aperture radar (SAR) data, are openly available on a global scale, enabling parsing global urban scenes through remote sensing imagery. However, their ability in identifying materials (pixel-wise classification) remains limited, due to the noisy collection environment and poor discriminative information as well as a limited number of well-annotated training images. To this end, we propose a novel cross-modal deep-learning framework, called X-ModalNet, with three well-designed modules: self-adversarial module, interactive learning module, and label propagation module, by learning to transfer more discriminative information from a small-scale hyperspectral image (HSI) into the classification task using a large-scale MSI or SAR data. Significantly, X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network, yielding semi-supervised cross-modality learning. We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.

  20. E. Mas, R. Paulik, K. Pakoksung, B. Adriano, L. Moya, A. Suppasri, A. Muhari, R. Khomarudin, N. Yokoya, M. Matsuoka, and S. Koshimura, " Characteristics of Tsunami Fragility Functions Developed Using Different Sources of Damage Data from the 2018 Sulawesi Earthquake and Tsunami ," Pure and Applied Geophysics, 2020.
    Quick Abstract

    Abstract: We developed tsunami fragility functions using three sources of damage data from the 2018 Sulawesi tsunami at Palu Bay in Indonesia obtained from (i) field survey data (FS), (ii) a visual interpretation of optical satellite images (VI), and (iii) a machine learning and remote sensing approach utilized on multisensor and multitemporal satellite images (MLRS). Tsunami fragility functions are cumulative distribution functions that express the probability of a structure reaching or exceeding a particular damage state in response to a specific tsunami intensity measure, in this case obtained from the interpolation of multiple surveyed points of tsunami flow depth. We observed that the FS approach led to a more consistent function than that of the VI and MLRS methods. In particular, an initial damage probability observed at zero inundation depth in the latter two methods revealed the effects of misclassifications on tsunami fragility functions derived from VI data; however, it also highlighted the remarkable advantages of MLRS methods. The reasons and insights used to overcome such limitations are discussed together with the pros and cons of each method. The results show that the tsunami damage observed in the 2018 Sulawesi event in Indonesia, expressed in the fragility function developed herein, is similar in shape to the function developed after the 1993 Hokkaido Nansei-oki tsunami, albeit with a slightly lower damage probability between zero-to-five-meter inundation depths. On the other hand, in comparison with the fragility function developed after the 2004 Indian Ocean tsunami in Banda Aceh, the characteristics of Palu structures exhibit higher fragility in response to tsunamis. The two-meter inundation depth exhibited nearly 20% probability of damage in the case of Banda Aceh, while the probability of damage was close to 70% at the same depth in Palu.

  21. M. E. Paoletti, J. M. Haut, P. Ghamisi, N. Yokoya, J. Plaza, A. Plaza, " U-IMG2DSM: Unpaired Simulation of Digital Surface Models with Generative Adversarial Networks ," IEEE Geoscience and Remote Sensing Letters (Early Access), 2020.
    Quick Abstract

    Abstract: High-resolution digital surface models (DSMs) provide valuable height information about the Earth's surface, which can be successfully combined with other types of remotely sensed data in a wide range of applications. However, the acquisition of DSMs with high spatial resolution is extremely time-consuming and expensive with their estimation from a single optical image being an ill-possed problem. To overcome these limitations, this letter presents a new unpaired approach to obtain DSMs from optical images using deep learning techniques. Specifically, our new deep neural model is based on variational autoencoders (VAEs) and generative adversarial networks (GANs) to perform image-to-image translation, obtaining DSMs from optical images. Our newly proposed method has been tested in terms of photographic interpretation, reconstruction error, and classification accuracy using three well-known remotely sensed data sets with very high spatial resolution (obtained over Potsdam, Vaihingen, and Stockholm). Our experimental results demonstrate that the proposed approach obtains satisfactory reconstruction rates that allow enhancing the classification results for these images. The source code of our method is available from:

  22. Y. Chen, T.-Z. Huang, W. He, N. Yokoya, and X.-L. Zhao, " Hyperspectral Image Compressive Sensing Reconstruction Using Subspace-based Nonlocal Tensor Ring Decomposition ," IEEE Transactions on Image Processing (Early Access), pp. 1-16, 2020.
    Quick Abstract

    Abstract: Hyperspectral image compressive sensing reconstruction (HSI-CSR) can largely reduce the high expense and low efficiency of transmitting HSI to ground stations by storing a few compressive measurements, but how to precisely reconstruct the HSI from a few compressive measurements is a challenging issue. It has been proven that considering the global spectral correlation, spatial structure, and nonlocal selfsimilarity priors of HSI can achieve satisfactory reconstruction performances. However, most of the existing methods cannot simultaneously capture the mentioned priors and directly design the regularization term to the HSI. In this article, we propose a novel subspace-based nonlocal tensor ring decomposition method (SNLTR) for HSI-CSR. Instead of designing the regularization of the low-rank approximation to the HSI, we assume that the HSI lies in a low-dimensional subspace. Moreover, to explore the nonlocal self-similarity and preserve the spatial structure of HSI, we introduce a nonlocal tensor ring decomposition strategy to constrain the related coefficient image, which can decrease the computational cost compared to the methods that directly employ the nonlocal regularization to HSI. Finally, a well-known alternating minimization method is designed to efficiently solve the proposed SNLTR. Extensive experimental results demonstrate that our SNLTR method can significantly outperform existing approaches for HSI-CSR.

  23. G. Baier, W. He, and N. Yokoya, " Robust nonlocal low-rank SAR time series despeckling considering speckle correlation by total variation regularization ," IEEE Transactions on Geoscience and Remote Sensing (Early Access), 2020.
    Quick Abstract

    Abstract: Outliers and speckle both corrupt synthetic aperture radar (SAR) time series. Furthermore, due to the coherence between SAR acquisitions, their speckle can no longer be regarded as independent. We propose a nonlocal low-rank time series despeckling algorithm that is robust against outliers and also specifically addresses speckle correlation between acquisitions. By imposing a total variation regularization on the signal’s speckle component, its correlation between acquisition can be captured, facilitating the extraction of outliers from the unfiltered signal and correlated speckle. Robustness against outliers also addresses matching errors and inaccuracies in the nonlocal similarity search. Such errors include mismatched data in the nonlocal estimation process, which degrade denoising performance in conventional similarity-based filtering approaches. Multiple experiments on real and synthetic data assess the proposed approaches performance by comparing it to state-of-the-art methods. It provides filtering results of comparable quality but is not adversely affected by outliers.

  24. C. Yoo, J. Im, D. Cho, N. Yokoya, J. Xia, and B. Bechtel, " Estimation of all-weather 1 km MODIS land surface temperature for humid summer days ," Remote Sensing, vol. 12, pp. 1398, 2020.
    PDF    Quick Abstract

    Abstract: Land surface temperature (LST) is used as a critical indicator for various environmental issues because it links land surface fluxes with the surface atmosphere. Moderate-resolution imaging spectroradiometers (MODIS) 1 km LSTs have been widely utilized but have the serious limitation of not being provided under cloudy weather conditions. In this study, we propose two schemes to estimate all-weather 1 km Aqua MODIS daytime (1:30 p.m.) and nighttime (1:30 a.m.) LSTs in South Korea for humid summer days. Scheme 1(S1) is a two-step approach that first estimates 10 km LSTs and then conducts the spatial downscaling of LSTs from 10 km to 1 km. Scheme 2(S2), a one-step algorithm, directly estimates the 1 km all-weather LSTs. Eight advanced microwave scanning radiometer 2 (AMSR2) brightness temperatures, three MODIS-based annual cycle parameters, and six auxiliary variables were used for the LST estimation based on random forest machine learning. To confirm the effectiveness of each scheme, we have performed different validation experiments using clear-sky MODIS LSTs. Moreover, we have validated all-weather LSTs using bias-corrected LSTs from 10 in situ stations. In clear-sky daytime, the performance of S2 was better than S1. However, in cloudy sky daytime, S1 simulated low LSTs better than S2, with an average root mean squared error (RMSE) of 2.6 °C compared to an average RMSE of 3.8 °C over 10 stations. At nighttime, S1 and S2 demonstrated no significant difference in performance both under clear and cloudy sky conditions. When the two schemes were combined, the proposed all-weather LSTs resulted in an average R2 of 0.82 and 0.74 and with RMSE of 2.5 °C and 1.4 °C for daytime and nighttime, respectively, compared to the in situ data. This paper demonstrates the ability of the two different schemes to produce all-weather dynamic LSTs. The strategy proposed in this study can improve the applicability of LSTs in a variety of research and practical fields, particularly for areas that are very frequently covered with clouds.

  25. T.D. Pham, N. Yokoya, J. Xia, N.T. Ha, N.N. Le, T.T.T. Nguyen, T.H. Dao, T.T.P. Vu, T.D. Pham, and W. Takeuchi, " Comparison of machine learning methods for estimating mangrove above-ground biomass using multiple source remote sensing data in the red river delta biosphere reserve, Vietnam ," Remote Sensing, vol. 12, pp. 1334, 2020.
    PDF    Quick Abstract

    Abstract: This study proposes a hybrid intelligence approach based on an extreme gradient boosting regression and genetic algorithm, namely, the XGBR-GA model, incorporating Sentinel-2, Sentinel-1, and ALOS-2 PALSAR-2 data to estimate the mangrove above-ground biomass (AGB), including small and shrub mangrove patches in the Red River Delta biosphere reserve across the northern coast of Vietnam. We used the novel extreme gradient boosting decision tree (XGBR) technique together with genetic algorithm (GA) optimization for feature selection to construct and verify a mangrove AGB model using data from a field survey of 105 sampling plots conducted in November and December of 2018 and incorporated the dual polarimetric (HH and HV) data of the ALOS-2 PALSAR-2 L-band and the Sentinel-2 multispectral data combined with Sentinel-1 (C-band VV and VH) data. We employed the root-mean-square error (RMSE) and coefficient of determination (R2) to evaluate the performance of the proposed model. The capability of the XGBR-GA model was assessed via a comparison with other machine-learning (ML) techniques, i.e., the CatBoost regression (CBR), gradient boosted regression tree (GBRT), support vector regression (SVR), and random forest regression (RFR) models. The XGBR-GA model yielded a promising result (R2 = 0.683, RMSE = 25.08 Mg·ha−1) and outperformed the four other ML models. The XGBR-GA model retrieved a mangrove AGB ranging from 17 Mg·ha−1 to 142 Mg·ha−1 (with an average of 72.47 Mg·ha−1). Therefore, multisource optical and synthetic aperture radar (SAR) combined with the XGBR-GA model can be used to estimate the mangrove AGB in North Vietnam. The effectiveness of the proposed method needs to be further tested and compared to other mangrove ecosystems in the tropics.

  26. J. Kang, D. Hong, J. Liu, G. Baier, N. Yokoya, and B. Demir, " Learning Convolutional Sparse Coding on Complex Domain for Interferometric Phase Restoration ," IEEE Transactions on Neural Networks and Learning Systems, 2020.
  27. L. Moya , A. Muhari, B. Adriano, S. Koshimura, E. Mas, L. R. M. Perezd, and N. Yokoya, " Detecting urban changes using phase correlation and l1-based sparse model for early disaster response: A case study of the 2018 Sulawesi Indonesia earthquake-tsunami ," Remote Sensing of Environment (accepted for publication), 2020.
  28. T. D. Pham, N. N. Le, N. T. Ha, L. V. Nguyen, J. Xia, N. Yokoya, T. T. To, H. X. Trinh, L. Q. Kieu, and W. Takeuchi, " Estimating Mangrove Above-ground Biomass using Extreme Gradient Boosting Decision Trees Algorithm with a fusion of Sentinel-2 and ALOS-2 PALSAR-2 data in Can Gio Biosphere Reserve, Vietnam ," Remote Sensing, vol. 12, no. 5, pp. 777, 2020.
    PDF    Quick Abstract

    Abstract: This study investigates the effectiveness of gradient boosting decision trees techniques in estimating mangrove above-ground biomass (AGB) at the Can Gio biosphere reserve (Vietnam). For this purpose, we employed a novel gradient-boosting regression technique called the extreme gradient boosting regression (XGBR) algorithm implemented and verified a mangrove AGB model using data from a field survey of 121 sampling plots conducted during the dry season. The dataset fuses the data of the Sentinel-2 multispectral instrument (MSI) and the dual polarimetric (HH, HV) data of ALOS-2 PALSAR-2. The performance standards of the proposed model (root-mean-square error (RMSE) and coefficient of determination (R2)) were compared with those of other machine learning techniques, namely gradient boosting regression (GBR), support vector regression (SVR), Gaussian process regression (GPR), and random forests regression (RFR). The XGBR model obtained a promising result with R2 = 0.805, RMSE = 28.13 Mg ha−1, and the model yielded the highest predictive performance among the five machine learning models. In the XGBR model, the estimated mangrove AGB ranged from 11 to 293 Mg ha−1 (average = 106.93 Mg ha−1). This work demonstrates that XGBR with the combined Sentinel-2 and ALOS-2 PALSAR-2 data can accurately estimate the mangrove AGB in the Can Gio biosphere reserve. The general applicability of the XGBR model combined with multiple sourced optical and SAR data should be further tested and compared in a large-scale study of forest AGBs in different geographical and climatic ecosystems.

  29. B. Adriano, N. Yokoya, H. Miura, M. Matsuoka, and S. Koshimura, " A semiautomatic pixel-object method for detecting landslides using multitemporal ALOS-2 intensity images ," Remote Sensing, vol. 12, no. 3, pp. 561, 2020.
    PDF    Quick Abstract

    Abstract: The rapid and accurate mapping of large-scale landslides and other mass movement disasters is crucial for prompt disaster response efforts and immediate recovery planning. As such, remote sensing information, especially from synthetic aperture radar (SAR) sensors, has significant advantages over cloud-covered optical imagery and conventional field survey campaigns. In this work, we introduced an integrated pixel-object image analysis framework for landslide recognition using SAR data. The robustness of our proposed methodology was demonstrated by mapping two different source-induced landslide events, namely, the debris flows following the torrential rainfall that fell over Hiroshima, Japan, in early July 2018 and the coseismic landslide that followed the 2018 Mw6.7 Hokkaido earthquake. For both events, only a pair of SAR images acquired before and after each disaster by the Advanced Land Observing Satellite-2 (ALOS-2) was used. Additional information, such as digital elevation model (DEM) and land cover information, was employed only to constrain the damage detected in the affected areas. We verified the accuracy of our method by comparing it with the available reference data. The detection results showed an acceptable correlation with the reference data in terms of the locations of damage. Numerical evaluations indicated that our methodology could detect landslides with an accuracy exceeding 80%. In addition, the kappa coefficients for the Hiroshima and Hokkaido events were 0.30 and 0.47, respectively.

  30. T. Uezato, N. Yokoya, and W. He, " Illumination invariant hyperspectral image unmixing based on a digital surface model ," IEEE Transactions on Image Processing, vol. 29, no. 1, pp. 3652-3664, 2019.
    Quick Abstract

    Abstract: Although many spectral unmixing models have been developed to address spectral variability caused by variable incident illuminations, the mechanism of the spectral variability is still unclear. This paper proposes an unmixing model, named illumination invariant spectral unmixing (IISU). IISU makes the first attempt to use the radiance hyperspectral data and a LiDAR-derived digital surface model (DSM) in order to physically explain variable illuminations and shadows in the unmixing framework. Incident angles, sky factors, visibility from the sun derived from the LiDAR-derived DSM support the explicit explanation of endmember variability in the unmixing process from radiance perspective. The proposed model was efficiently solved by a straightforward optimization procedure. The unmixing results showed that the other state-of-the-art unmixing models did not work well especially in the shaded pixels. On the other hand, the proposed model estimated more accurate abundances and shadow compensated reflectance than the existing models.

  31. D. Hong, X. Wu, P. Ghamisi, J. Chanussot, N. Yokoya, and X. X. Zhu, " Invariant attribute profiles: A spatial-frequency joint feature extractor for hyperspectral image classification ," IEEE Trans. Geosci. Remote Sens. (accepted for publication), 2019.
  32. Y. Chen, L. Huang, L. Zhu, N. Yokoya, and X. Jia, " Fine-grained classification of hyperspectral imagery based on deep learning ," Remote Sensing (accepted for publication), 2019.
  33. Y. Chen, W. He, N. Yokoya, and T.-Z. Huang, " Non-local tensor ring decomposition for hyperspectral image denoising ," IEEE Trans. Geosci. Remote Sens., vol. 58, no. 2, pp. 1348-1362, 2019.
    Quick Abstract

    Abstract: Hyperspectral image (HSI) denoising is a fundamental problem in remote sensing and image processing. Recently, non-local low-rank tensor approximation based denoising methods have attracted much attention, due to the advantage of fully exploiting the non-local self-similarity and global spectral correlation. Existing non-local low-rank tensor approximation methods were mainly based on two common Tucker or CP decomposition and achieved the state-of-the-art results, but they suffer some troubles and are not the best approximation for a tensor. For example, the number of parameters of Tucker decomposition increases exponentially follow its dimension, and CP decomposition cannot better preserve the intrinsic correlation of HSI. In this paper, we propose a non-local tensor ring (TR) approximation for HSI denoising by utilizing TR decomposition to simultaneously explore non-local self-similarity and global spectral low-rank characteristic. TR decomposition approximates a high-order tensor as a sequence of cyclically contracted three-order tensors, which has a strong ability to explore these two intrinsic priors and improve the HSI denoising result. Moreover, we develop an efficient proximal alternating minimization algorithm to efficiently optimize the proposed TR decomposition model. Extensive experiments on three simulated datasets under several noise levels and two real datasets testify that the proposed TR model performs better HSI denoising results than several state-of-the-art methods in term of quantitative and visual performance evaluations.

  34. D. Hong, N. Yokoya, J. Chanussot, J. Xu, and X. X. Zhu, " Learning to propagate labels on graphs: An iterative multitask regression framework for semi-supervised hyperspectral dimensionality reduction ," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 158, no. 35-49, 2019.
    Quick Abstract

    Abstract: Hyperspectral dimensionality reduction (HDR), an important preprocessing step prior to high-level data analysis, has been garnering growing attention in the remote sensing community. Although a variety of methods, both unsupervised and supervised models, have been proposed for this task, yet the discriminative ability in feature representation still remains limited due to the lack of a powerful tool that effectively exploits the labeled and unlabeled data in the HDR process. A semi-supervised HDR approach, called iterative multitask regression (IMR), is proposed in this paper to address this need. IMR aims at learning a low-dimensional subspace by jointly considering the labeled and unlabeled data, and also bridging the learned subspace with two regression tasks: labels and pseudo-labels initialized by a given classifier. More significantly, IMR dynamically propagates the labels on a learnable graph and progressively refines pseudo-labels, yielding a well-conditioned feedback system. Experiments conducted on three widely-used hyperspectral image datasets demonstrate that the dimension-reduced features learned by the proposed IMR framework with respect to classification or recognition accuracy are superior to those of related state-of-the-art HDR approaches.

  35. D. Hong, J. Chanussot, N. Yokoya, J. Kang, and X. X. Zhu, " Learning shared cross-modality representation using multispectral-LiDAR and hyperspectral data ," IEEE Geosci. Remote Sens. Lett. (accepted for publication), 2019.
    Quick Abstract

    Abstract: Due to the ever-growing diversity of the data source, multi-modality feature learning has attracted more and more attention. However, most of these methods are designed by jointly learning feature representation from multi-modalities that exist in both training and test sets, yet they are less investigated in absence of certain modality in the test phase. To this end, in this letter, we propose to learn a shared feature space across multi-modalities in the training process. By this way, the out-of-sample from any of multi-modalities can be directly projected onto the learned space for a more effective cross-modality representation. More significantly, the shared space is regarded as a latent subspace in our proposed method, which connects the original multi-modal samples with label information to further improve the feature discrimination. Experiments are conducted on the multispectral-Lidar and hyperspectral dataset provided by the 2018 IEEE GRSS Data Fusion Contest to demonstrate the effectiveness and superiority of the proposed method in comparison with several popular baselines.

  36. Y. Chen, W. He, N. Yokoya, and T.-Z. Huang, " Blind cloud and cloud shadow removal of multitemporal images based on total variation regularized low-rank sparsity decomposition ," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 157, pp. 93-107, 2019.
    Quick Abstract

    Abstract: Cloud and cloud shadow (cloud/shadow) removal from multitemporal satellite images is a challenging task and has elicited much attention for subsequent information extraction. Regarding cloud/shadow areas as missing information, low-rank matrix/tensor completion based methods are popular to recover information undergoing cloud/shadow degradation. However, existing methods required to determine the cloud/shadow locations in advance and failed to completely use the latent information in cloud/shadow areas. In this study, we propose a blind cloud/shadow removal method for time-series remote sensing images by unifying cloud/shadow detection and removal together. First, we decompose the degraded image into low-rank clean image (surface-reflected) component and sparse (cloud/shadow) component, which can simultaneously and completely use the underlying characteristics of these two components. Meanwhile, the spatial-spectral total variation regularization is introduced to promote the spatial-spectral continuity of the cloud/shadow component. Second, the cloud/shadow locations are detected from the sparse component using a threshold method. Finally, we adopt the cloud/shadow detection results to guide the information compensation from the original observed images to better preserve the information in cloud/shadow-free locations. The problem of the proposed model is efficiently addressed using the alternating direction method of multipliers. Both simulated and real datasets are performed to demonstrate the effectiveness of our method for cloud/shadow detection and removal when compared with other state-of-the-art methods.

  37. A. Samat, N. Yokoya, P. Du, S. Liu, L. Ma, Y. Ge, G. Issanova, A. Saparov, J. Abuduwaili, and C. Lin, " Direct, ECOC, ND and END frameworks—which one is the best? An empirical study of Sentinel-2A MSIL1C image classification for arid-land vegetation mapping in the Ili River Delta, Kazakhstan ," Remote Sensing, vol. 11, no. 16, pp. 1953, 2019.
    PDF    Quick Abstract

    Abstract: To facilitate the advances in Sentinel-2A products for land cover from Moderate Resolution Imaging Spectroradiometer (MODIS) and Landsat imagery, Sentinel-2A MultiSpectral Instrument Level-1C (MSIL1C) images are investigated for large-scale vegetation mapping in an arid land environment that is located in the Ili River delta, Kazakhstan. For accurate classification purposes, multi-resolution segmentation (MRS) based extended object-guided morphological profiles (EOMPs) are proposed and then compared with conventional morphological profiles (MPs), MPs with partial reconstruction (MPPR), object-guided MPs (OMPs), OMPs with mean values (OMPsM), and object-oriented (OO)-based image classification techniques. Popular classifiers, such as C4.5, an extremely randomized decision tree (ERDT), random forest (RaF), rotation forest (RoF), classification via random forest regression (CVRFR), ExtraTrees, and radial basis function (RBF) kernel-based support vector machines (SVMs) are adopted to answer the question of whether nested dichotomies (ND) and ensembles of ND (END) are truly superior to direct and error-correcting output code (ECOC) multiclass classification frameworks. Finally, based on the results, the following conclusions are drawn: 1) the superior performance of OO-based techniques over MPs, MPPR, OMPs, and OMPsM is clear for Sentinel-2A MSIL1C image classification, while the best results are achieved by the proposed EOMPs; 2) the superior performance of ND, ND with class balancing (NDCB), ND with data balancing (NDDB), ND with random-pair selection (NDRPS), and ND with further centroid (NDFC) over direct and ECOC frameworks is not confirmed, especially in the cases of using weak classifiers for low-dimensional datasets; 3) from computationally efficient, high accuracy, redundant to data dimensionality and easy of implementations points of view, END, ENDCB, ENDDB, and ENDRPS are alternative choices to direct and ECOC frameworks; 4) surprisingly, because in the ensemble learning (EL) theorem, “weaker” classifiers (ERDT here) always have a better chance of reaching the trade-off between diversity and accuracy than “stronger” classifies (RaF, ExtraTrees, and SVM here), END with ERDT (END-ERDT) achieves the best performance with less than a 0.5% difference in the overall accuracy (OA) values, but is 100 to 10000 times faster than END with RaF and ExtraTrees, and ECOC with SVM while using different datasets with various dimensions; and, 5) Sentinel-2A MSIL1C is better choice than the land cover products from MODIS and Landsat imagery for vegetation species mapping in an arid land environment, where the vegetation species are critically important, but sparsely distributed.

  38. Y. Chen, W. He, N. Yokoya, and T.-Z. Huang, " Hyperspectral image restoration using weighted group sparsity regularized low-rank tensor decomposition ," IEEE Transactions on Cybernetics (accepted for publication), 2019.
    Quick Abstract

    Abstract: Mixed noise (such as Gaussian, impulse, stripe, and deadline noises) contamination is a common phenomenon in hyperspectral imagery (HSI), greatly degrading visual quality and affecting subsequent processing accuracy. By encoding sparse prior to the spatial or spectral difference images, total variation (TV) regularization is an efficient tool for removing the noises. However, the previous TV term cannot maintain the shared group sparsity pattern of the spatial difference images of different spectral bands. To address this issue, this study proposes a group sparsity regularization of the spatial difference images for HSI restoration. Instead of using L1 or L2-norm (sparsity) on the difference image itself, we introduce a weighted L2,1-norm to constrain the spatial difference image cube, efficiently exploring the shared group sparse pattern. Moreover, we employ the well-known low-rank Tucker decomposition to capture the global spatial-spectral correlation from three HSI dimensions. To summarize, a weighted group sparsity regularized low-rank tensor decomposition (LRTDGS) method is presented for HSI restoration. An efficient augmented Lagrange multiplier algorithm is employed to solve the LRTDGS model. The superiority of this method for HSI restoration is demonstrated by a series of experimental results from both simulated and real data, as compared to other state-of-the-art TV regularized low-rank matrix/tensor decomposition methods.

  39. D. Hong, N. Yokoya, J. Chanussot, and X. X. Zhu, " CoSpace: Common subspace learning from hyperspectral-multispectral correspondences ," IEEE Trans. Geosci. Remote Sens., vol. 57, no. 7, pp. 4349-4359, 2019.
    PDF    Quick Abstract

    Abstract: With a large amount of open satellite multispectral (MS) imagery (e.g., Sentinel-2 and Landsat-8), considerable attention has been paid to global MS land cover classification. However, its limited spectral information hinders further improving the classification performance. Hyperspectral imaging enables discrimination between spectrally similar classes but its swath width from space is narrow compared to MS ones. To achieve accurate land cover classification over a large coverage, we propose a cross-modality feature learning framework, called common subspace learning (CoSpace), by jointly considering subspace learning and supervised classification. By locally aligning the manifold structure of the two modalities, CoSpace linearly learns a shared latent subspace from hyperspectral-MS (HS-MS) correspondences. The MS out-of-samples can be then projected into the subspace, which are expected to take advantages of rich spectral information of the corresponding hyperspectral data used for learning, and thus leads to a better classification. Extensive experiments on two simulated HS-MS data sets (University of Houston and Chikusei), where HS-MS data sets have tradeoffs between coverage and spectral resolution, are performed to demonstrate the superiority and effectiveness of the proposed method in comparison with previous state-of-the-art methods.

  40. W. He, N. Yokoya, L. Yuan, and Q. Zhao, " Remote sensing image reconstruction using tensor ring completion and total-variation ," IEEE Trans. Geosci. Remote Sens., vol. 57, no. 11, pp. 8998-9009, 2019.
    Quick Abstract

    Abstract: Time-series remote sensing (RS) images are often corrupted by various types of missing information such as dead pixels, clouds, and cloud shadows that significantly influence the subsequent applications. In this paper, we introduce a new low-rank tensor decomposition model, termed tensor ring (TR) decomposition, to the analysis of RS datasets and propose a TR completion method for the missing information reconstruction. The proposed TR completion model has the ability to utilize the low-rank property of time-series RS images from different dimensions. To furtherly explore the smoothness of the RS image spatial information, total-variation regularization is also incorporated into the TR completion model. The proposed model is efficiently solved using two algorithms, the augmented Lagrange multiplier (ALM) and the alternating least square (ALS) methods. The simulated and real data experiments show superior performance compared to other state-of-the-art low-rank related algorithms.

  41. Y. Xu, B. Du, L. Zhang, D. Cerra, M. Pato, E. Carmona, S. Prasad, N. Yokoya, R. Hansch, and B. Le Saux, " Advanced multi-sensor optical remote sensing for urban land use and land cover classification: Outcome of the 2018 IEEE GRSS Data Fusion Contest ," IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 6, pp. 1709-1724, 2019.
    Quick Abstract

    Abstract: This paper presents the scientific outcomes of the 2018 Data Fusion Contest organized by the Image Analysis and Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society. The 2018 Contest addressed the problem of urban observation and monitoring with advanced multi-source optical remote sensing (multispectral LiDAR, hyperspectral imaging, and very high-resolution imagery). The competition was based on urban land use and land cover classification, aiming to distinguish between very diverse and detailed classes of urban objects, materials, and vegetation. Besides data fusion, it also quantified the respective assets of the novel sensors used to collect the data. Participants proposed elaborate approaches rooted in remote-sensing, and also in machine learning and computer vision, to make the most of the available data. Winning approaches combine convolutional neural networks with subtle earth-observation data scientist expertise.

  42. B. Adriano, J. Xia, G. Baier, N. Yokoya, S. Koshimura, " Multi-source data fusion based on ensemble learning for rapid building damage mapping during the 2018 Sulawesi Earthquake and Tsunami in Palu, Indonesia ," Remote Sensing, vol. 11, no. 7, pp. 886, 2019.
    PDF    Quick Abstract

    Abstract: This work presents a detailed analysis of building damage recognition, employing multi-source data fusion and ensemble learning algorithms for rapid damage mapping tasks. A damage classification framework is introduced and tested to categorize the building damage following the recent 2018 Sulawesi earthquake and tsunami. Three robust ensemble learning classifiers were investigated for recognizing building damage from SAR and optical remote sensing datasets and their derived features. The contribution of each feature dataset was also explored, considering different combinations of sensors as well as their temporal information. SAR scenes acquired by the ALOS-2 PALSAR-2 and Sentinel-1 sensors were used. The optical Sentinel-2 and PlanetScope sensors were also included in this study. A non-local filter in the preprocessing phase was used to enhance the SAR features. Our results demonstrated that the canonical correlation forests classifier performs better in comparison to the other classifiers. In the data fusion analysis, DEM- and SAR-derived features contributed the most in the overall damage classification. Our proposed mapping framework successfully classifies four levels of building damage (with overall accuracy > 90%, average accuracy > 67%). The proposed framework learned the damage patterns from a limited available human-interpreted building damage annotation and expands this information to map a larger affected area. This process including pre- and post-processing phases were completed in about 3 hours after acquiring all raw datasets.

  43. P. Ghamisi, B. Rasti, N. Yokoya, Q. Wang, B. Höfle, L. Bruzzone, F. Bovolo, M. Chi, K. Anders, R. Gloaguen, P. M. Atkinson, and J. A. Benedikt, " Multisource and Multitemporal Data Fusion in Remote Sensing ," IEEE Geoscience and Remote Sensing Magazine, vol. 7, no. 1, pp. 6-39, 2019.
  44. T. D. Pham, N. Yokoya, D. T. Bui, K. Yoshino, and D. A. Friess, " Remote sensing approaches for monitoring mangrove species, structure and biomass: opportunities and challenges ," Remote Sensing, vol. 11, no. 3, pp. 230, 2019.
    PDF    Quick Abstract

    Abstract: The mangrove ecosystem plays a vital role in the global carbon cycle, by reducing greenhouse gas emissions and mitigating the impacts of climate change. However, mangroves have been lost worldwide, resulting in substantial carbon stock losses. Additionally, some aspects of the mangrove ecosystem remain poorly characterized compared to other forest ecosystems due to practical difficulties in measuring and monitoring mangrove biomass and their carbon stocks. Without a quantitative method for effectively monitoring biophysical parameters and carbon stocks in mangroves, robust policies and actions for sustainably conserving mangroves in the context of climate change mitigation and adaptation are more difficult. In this context, remote sensing provides an important tool for monitoring mangroves and identifying attributes such as species, biomass, and carbon stocks. A wide range of studies is based on optical imagery (aerial photography, multispectral, and hyperspectral) and synthetic aperture radar (SAR) data. Remote sensing approaches have been proven effective for mapping mangrove species, estimating their biomass, and assessing changes in their extent. This review provides an overview of the techniques that are currently being used to map various attributes of mangroves, summarizes the studies that have been undertaken since 2010 on a variety of remote sensing applications for monitoring mangroves, and addresses the limitations of these studies. We see several key future directions for the potential use of remote sensing techniques combined with machine learning techniques for mapping mangrove areas and species, and evaluating their biomass and carbon stocks.

  45. D. Hong, N. Yokoya, J. Chanussot, and X. X. Zhu, " An augmented linear mixing model to address spectral variability for hyperspectral unmixing ," IEEE Transactions on Image Processing, vol. 28, no. 4, pp. 1923-1938, 2018.
    PDF    Quick Abstract

    Abstract: Hyperspectral imagery collected from airborne or satellite sources inevitably suffers from spectral variability, making it difficult for spectral unmixing to accurately estimate abundance maps. The classical unmixing model, the linear mixing model (LMM), generally fails to handle this sticky issue effectively. To this end, we propose a novel spectral mixture model, called the augmented linear mixing model (ALMM), to address spectral variability by applying a data-driven learning strategy in inverse problems of hyperspectral unmixing. The proposed approach models the main spectral variability (i.e., scaling factors) generated by variations in illumination or typography separately by means of the endmember dictionary. It then models other spectral variabilities caused by environmental conditions (e.g., local temperature and humidity, atmospheric effects) and instrumental configurations (e.g., sensor noise), as well as material nonlinear mixing effects, by introducing a spectral variability dictionary. To effectively run the data-driven learning strategy, we also propose a reasonable prior knowledge for the spectral variability dictionary, whose atoms are assumed to be low-coherent with spectral signatures of endmembers, which leads to a well-known low-coherence dictionary learning problem. Thus, a dictionary learning technique is embedded in the framework of spectral unmixing so that the algorithm can learn the spectral variability dictionary and estimate the abundance maps simultaneously. Extensive experiments on synthetic and real datasets are performed to demonstrate the superiority and effectiveness of the proposed method in comparison with previous state-of-the-art methods.

  46. D. Hong, N. Yokoya, N. Ge, J. Chanussot, and X. X. Zhu, " Learnable manifold alignment (LeMA) : A semi-supervised cross-modality learning framework for land cover and land use classification ," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 147, pp. 193-205, 2018.
    PDF    Quick Abstract

    Abstract: In this paper, we aim at tackling a general but interesting cross-modality feature learning question in remote sensing community - can a limited amount of highly-discriminative (e.g., hyperspectral) training data improve the performance of a classification task using a large amount of poorly-discriminative (e.g., multispectral) data? Traditional semi-supervised manifold alignment methods do not perform sufficiently well for such problems, since the hyperspectral data is very expensive to be largely collected in a trade-off between time and efficiency, compared to the multispectral data. To this end, we propose a novel semi-supervised cross-modality learning framework, called learnable manifold alignment (LeMA). LeMA learns a joint graph structure directly from the data instead of using a given fixed graph defined by a Gaussian kernel function. With the learned graph, we can further capture the data distribution by graph-based label propagation, which enables finding a more accurate decision boundary. Additionally, an optimization strategy based on the alternating direction method of multipliers (ADMM) is designed to solve the proposed model. Extensive experiments on two hyperspectral-multispectral datasets demonstrate the superiority and effectiveness of the proposed method in comparison with several state-of-the-art methods.

  47. W. He and N. Yokoya, " Multi-temporal Sentinel-1 and -2 data fusion for optical image simulation ," ISPRS International Journal of Geo-Information, vol. 7, no. 10, pp. 389, 2018.
    PDF    Quick Abstract

    Abstract: In this paper, we present the optical image simulation from synthetic aperture radar (SAR) data using deep learning based methods. Two models, i.e., optical image simulation directly from the SAR data and from multi-temporal SAR-optical data, are proposed to testify the possibilities. The deep learning based methods that we chose to achieve the models are a convolutional neural network (CNN) with a residual architecture and a conditional generative adversarial network (cGAN). We validate our models using the Sentinel-1 and -2 datasets. The experiments demonstrate that the model with multi-temporal SAR-optical data can successfully simulate the optical image, meanwhile, the model with simple SAR data as input failed. The optical image simulation results indicate the possibility of SAR-optical information blending for the subsequent applications such as large-scale cloud removal, and optical data temporal super-resolution. We also investigate the sensitivity of the proposed models against the training samples, and reveal possible future directions.

  48. L. Guanter, M. Brell, J. C.-W. Chan, C. Giardino, J. Gomez-Dans, C. Mielke, F. Morsdorf, K. Segl, and N. Yokoya, " Synergies of spaceborne imaging spectroscopy with other remote sensing approaches ," Surveys in Geophysics, pp. 1-31, 2018.
    Quick Abstract

    Abstract: Imaging spectroscopy (IS), also commonly known as hyperspectral remote sensing, is a powerful remote sensing technique for the monitoring of the Earth’s surface and atmosphere. Pixels in optical hyperspectral images consist of continuous reflectance spectra formed by hundreds of narrow spectral channels, allowing an accurate representation of the surface composition through spectroscopic techniques. However, technical constraints in the definition of imaging spectrometers make spectral coverage and resolution to be usually traded by spatial resolution and swath width, as opposed to optical multispectral (MS) systems typically designed to maximize spatial and/or temporal resolution. This complementarity suggests that a synergistic exploitation of spaceborne IS and MS data would be an optimal way to fulfill those remote sensing applications requiring not only high spatial and temporal resolution data, but also rich spectral information. On the other hand, IS has been shown to yield a strong synergistic potential with non-optical remote sensing methods, such as thermal infrared (TIR) and light detection and ranging (LiDAR). In this contribution we review theoretical and methodological aspects of potential synergies between optical IS and other remote sensing techniques. The focus is put on the evaluation of synergies between spaceborne optical IS and MS systems because of the expected availability of the two types of data in the next years. Short reviews of potential synergies of IS with TIR and LiDAR measurements are also provided.

  49. J. Xia, N. Yokoya, and A. Iwasaki, " Fusion of hyperspectral and LiDAR data with a novel ensemble classifier ," IEEE Geosci. Remote Sens. Lett., vol. 15, no. 6, pp. 957-961, 2018.
    Quick Abstract

    Abstract: Due to the development of sensors and data acquisition technology, the fusion of features from multiple sensors is a very hot topic. In this letter, the use of morphological features to fuse an HS image and a light detection and ranging (LiDAR)-derived digital surface model (DSM) is exploited via an ensemble classifier. In each iteration, we first apply morphological openings and closings with partial reconstruction on the first few principal components (PCs) of the HS and LiDAR datasets to produce morphological features to model spatial and elevation information for HS and LiDAR datasets. Second, three groups of features (i.e., spectral, morphological features of HS and LiDAR data) are split into several disjoint subsets. Third, data transformation is applied to each subset and the features extracted in each subset are stacked as the input of a random forest (RF) classifier. Three data transformation methods, including principal component analysis (PCA), linearity preserving projection (LPP), and unsupervised graph fusion (UGF) are introduced into the ensemble classification process. Finally, we integrate the classification results achieved at each step by a majority vote. Experimental results on co-registered HS and LiDAR-derived DSM demonstrate the effectiveness and potentialities of the proposed ensemble classifier.

  50. P. Ghamisi and N. Yokoya, " IMG2DSM: Height simulation from single imagery using conditional generative adversarial nets ," IEEE Geosci. Remote Sens. Lett., vol. 15, no. 5, pp. 794-798, 2018.
    Quick Abstract

    Abstract: This paper proposes a groundbreaking approach in the remote sensing community to simulating digital surface model (DSM) from a single optical image. This novel technique uses conditional generative adversarial nets whose architecture is based on an encoder-decoder network with skip connections (generator) and penalizing structures at the scale of image patches (discriminator). The network is trained on scenes where both DSM and optical data are available to establish an image-to-DSM translation rule. The trained network is then utilized to simulate elevation information on target scenes where no corresponding elevation information exists. The capability of the approach is evaluated both visually (in terms of photo interpretation) and quantitatively (in terms of reconstruction errors and classification accuracies) on sub-decimeter spatial resolution datasets captured over Vaihingen, Potsdam, and Stockholm. The results confirm the promising performance of the proposed framework.

  51. N. Yokoya, P. Ghamisi, J. Xia, S. Sukhanov, R. Heremans, I. Tankoyeu, B. Bechtel, B. Le Saux, G. Moser, and D. Tuia, " Open data for global multimodal land use classification: Outcome of the 2017 IEEE GRSS Data Fusion Contest ," IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 5, pp. 1363-1377, 2018.
    PDF    Quick Abstract

    Abstract: In this paper, we present the scientific outcomes of the 2017 Data Fusion Contest organized by the Image Analysis and Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society. The 2017 Contest was aimed at addressing the problem of local climate zones classification based on a multitemporal and multimodal dataset, including image (Landsat 8 and Sentinel-2) and vector data (from OpenStreetMap). The competition, based on separate geographical locations for the training and testing of the proposed solution, aimed at models that were accurate (assessed by accuracy metrics on an undisclosed reference for the test cities), general (assessed by spreading the test cities across the globe), and computationally feasible (assessed by having a test phase of limited time). The techniques proposed by the participants to the Contest spanned across a rather broad range of topics, and of mixed ideas and methodologies deriving from computer vision and machine learning but also deeply rooted in the specificities of remote sensing. In particular, rigorous atmospheric correction, the use of multidate images, and the use of ensemble methods fusing results obtained from different data sources/time instants made the difference.

  52. B. Le Saux, N. Yokoya, R. Hansch, and S. Prasad, " 2018 IEEE GRSS Data Fusion Contest: Multimodal land use classification ," IEEE Geoscience and Remote Sensing Magazine, vol. 6, no. 1, pp. 52-54, 2018.

Conference Papers

  1. T. Uezato, D. Hong, N. Yokoya, W. He, "Guided deep decoder: Unsupervised image pair fusion," European Conference on Computer Vision (ECCV) (spotlight), 2020.
  2. W. He, Q. Yao, C. Li, N. Yokoya, and Q. Zhao, "Non-local meets global: An integrated paradigm for hyperspectral denoising," Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  3. W. He, L. Yuan, and N. Yokoya, "Total-variation-regularized tensor ring completion for remote sensing image reconstruction," International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.
  4. V. Ferraris, N. Yokoya, N. Dobigeon, and M. Chabert, "A comparative study of fusion-based change detection methods for multi-band images with different spectral and spatial resolutions," IEEE International Geoscience and Remote Sensing Symposium, 2018.
  5. J. Xia, N. Yokoya, and A. Iwasaki, "Boosting for domain adaptation extreme learning machines for hyperspectral image classification," IEEE International Geoscience and Remote Sensing Symposium, 2018.
  6. D. Hong, N. Yokoya, J. Xu, and X. X. Zhu, "Joint & progressive learning from high-dimensional data for multi-label classification," European Conference on Computer Vision, 2018.