In recent years, multimodal remote sensing images (MRSI) have demonstrated complementary cross-modal information owing to their heterogeneous characteristics, enabling more detailed and effective scene interpretation than single-modality data. To address challenges in existing multimodal fusion methods, this paper proposes DLGCNet, a multimodal segmentation network that jointly optimizes a dual diagonal low-rank adaptation (D2LoRA) training framework for visual foundation models (VFMs) and a graph convolutional feature fusion (GCFF) module. To better adapt to the data distributions of MRSI, D2LoRA introduces two trainable diagonal matrices that perform row-wise and column-wise transformations on the low-rank weight matrix, thereby improving the VFM’s adaptability and feature extraction performance for MRSI. To overcome the limited cross-modal modeling capacity of convolutional neural network-based fusion and the excessive complexity of transformer-based fusion, GCFF dynamically adjusts the graph Laplacian according to modal information and establishes long-range cross-modal dependencies with lower computational complexity than transformer-based fusion methods. Experimental results demonstrate that, compared to current state-of-the-art multimodal data fusion methods, the proposed DLGCNet achieves optimal segmentation results on three datasets: Potsdam, Vaihingen, and WHU-OPT-SAR. The source code is accessible at https://github.com/2023xjh2023/DLGCNet.

DLGCNet: Multimodal remote sensing semantic segmentation via dual diagonal low-rank adaptation and graph convolutional feature fusion / J. Zeng, J. Xu, X. Jia, B. Deng, Y. Zhai, C. Qin, P. Coscia, A. Genovese, X. Tian. - In: KNOWLEDGE-BASED SYSTEMS. - ISSN 0950-7051. - 347:(2026), pp. 116299.1-116299.13. [10.1016/j.knosys.2026.116299]

DLGCNet: Multimodal remote sensing semantic segmentation via dual diagonal low-rank adaptation and graph convolutional feature fusion

P. Coscia;A. Genovese
Penultimo
;
2026

Abstract

In recent years, multimodal remote sensing images (MRSI) have demonstrated complementary cross-modal information owing to their heterogeneous characteristics, enabling more detailed and effective scene interpretation than single-modality data. To address challenges in existing multimodal fusion methods, this paper proposes DLGCNet, a multimodal segmentation network that jointly optimizes a dual diagonal low-rank adaptation (D2LoRA) training framework for visual foundation models (VFMs) and a graph convolutional feature fusion (GCFF) module. To better adapt to the data distributions of MRSI, D2LoRA introduces two trainable diagonal matrices that perform row-wise and column-wise transformations on the low-rank weight matrix, thereby improving the VFM’s adaptability and feature extraction performance for MRSI. To overcome the limited cross-modal modeling capacity of convolutional neural network-based fusion and the excessive complexity of transformer-based fusion, GCFF dynamically adjusts the graph Laplacian according to modal information and establishes long-range cross-modal dependencies with lower computational complexity than transformer-based fusion methods. Experimental results demonstrate that, compared to current state-of-the-art multimodal data fusion methods, the proposed DLGCNet achieves optimal segmentation results on three datasets: Potsdam, Vaihingen, and WHU-OPT-SAR. The source code is accessible at https://github.com/2023xjh2023/DLGCNet.
Multimodal fusion ; Remote sensing; Semantic segmentation; Parameter-efficient fine-tuning
Settore INFO-01/A - Informatica
2026
25-mag-2026
Article (author)
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0950705126010257-main.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Licenza: Nessuna licenza
Dimensione 4.77 MB
Formato Adobe PDF
4.77 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1249949
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex 0
social impact