In recent years, multimodal remote sensing images (MRSI) have demonstrated complementary cross-modal information owing to their heterogeneous characteristics, enabling more detailed and effective scene interpretation than single-modality data. To address challenges in existing multimodal fusion methods, this paper proposes DLGCNet, a multimodal segmentation network that jointly optimizes a dual diagonal low-rank adaptation (D2LoRA) training framework for visual foundation models (VFMs) and a graph convolutional feature fusion (GCFF) module. To better adapt to the data distributions of MRSI, D2LoRA introduces two trainable diagonal matrices that perform row-wise and column-wise transformations on the low-rank weight matrix, thereby improving the VFM’s adaptability and feature extraction performance for MRSI. To overcome the limited cross-modal modeling capacity of convolutional neural network-based fusion and the excessive complexity of transformer-based fusion, GCFF dynamically adjusts the graph Laplacian according to modal information and establishes long-range cross-modal dependencies with lower computational complexity than transformer-based fusion methods. Experimental results demonstrate that, compared to current state-of-the-art multimodal data fusion methods, the proposed DLGCNet achieves optimal segmentation results on three datasets: Potsdam, Vaihingen, and WHU-OPT-SAR. The source code is accessible at https://github.com/2023xjh2023/DLGCNet.
DLGCNet: Multimodal remote sensing semantic segmentation via dual diagonal low-rank adaptation and graph convolutional feature fusion / J. Zeng, J. Xu, X. Jia, B. Deng, Y. Zhai, C. Qin, P. Coscia, A. Genovese, X. Tian. - In: KNOWLEDGE-BASED SYSTEMS. - ISSN 0950-7051. - 347:(2026), pp. 116299.1-116299.13. [10.1016/j.knosys.2026.116299]
DLGCNet: Multimodal remote sensing semantic segmentation via dual diagonal low-rank adaptation and graph convolutional feature fusion
P. Coscia;A. GenovesePenultimo
;
2026
Abstract
In recent years, multimodal remote sensing images (MRSI) have demonstrated complementary cross-modal information owing to their heterogeneous characteristics, enabling more detailed and effective scene interpretation than single-modality data. To address challenges in existing multimodal fusion methods, this paper proposes DLGCNet, a multimodal segmentation network that jointly optimizes a dual diagonal low-rank adaptation (D2LoRA) training framework for visual foundation models (VFMs) and a graph convolutional feature fusion (GCFF) module. To better adapt to the data distributions of MRSI, D2LoRA introduces two trainable diagonal matrices that perform row-wise and column-wise transformations on the low-rank weight matrix, thereby improving the VFM’s adaptability and feature extraction performance for MRSI. To overcome the limited cross-modal modeling capacity of convolutional neural network-based fusion and the excessive complexity of transformer-based fusion, GCFF dynamically adjusts the graph Laplacian according to modal information and establishes long-range cross-modal dependencies with lower computational complexity than transformer-based fusion methods. Experimental results demonstrate that, compared to current state-of-the-art multimodal data fusion methods, the proposed DLGCNet achieves optimal segmentation results on three datasets: Potsdam, Vaihingen, and WHU-OPT-SAR. The source code is accessible at https://github.com/2023xjh2023/DLGCNet.| File | Dimensione | Formato | |
|---|---|---|---|
|
1-s2.0-S0950705126010257-main.pdf
accesso riservato
Tipologia:
Publisher's version/PDF
Licenza:
Nessuna licenza
Dimensione
4.77 MB
Formato
Adobe PDF
|
4.77 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




