Existing UAV vehicle-detection datasets, typically captured under static and uniform illumination, fail to adequately represent the variable lighting conditions, dense traffic, and frequent occlusions observed in real-world transportation hubs. To bridge this gap, a new dataset, UAV-HubSurveillance, is introduced to capture complex vehicle interactions across urban transportation nodes under diverse environmental scenarios. Although UAV-HubSurveillance provides rich and multidimensional interaction data, it still suffers from severe occlusions and adverse weather conditions that hinder detection and identification accuracy. To address these limitations, a novel vehicle detection framework, termed bidirectional interactive multi-scale aggregation-yolo (BIMSA-YOLO), is proposed, which integrates bidirectional feature interaction with adaptive multi-scale aggregation to enhance detection robustness. First, the bidirectional shallow fusion module (BSFM) facilitates cross-resolution information exchange through a lightweight gating strategy, preserving fine-grained details of small objects. Second, the interactive deep fusion module (IDFM) reinforces contextual coherence via attention-guided cross-level semantic fusion. Third, the multi-scale adaptive aggregation module (MSAAM) dynamically aligns and integrates multi-scale features to improve robustness against scale variation. Extensive experiments conducted on the UAV-HubSurveillance dataset demonstrate that BIMSA-YOLO significantly enhances detection performance under dynamic, occluded, and adverse-weather conditions. Specifically, the proposed model achieves an mAP0.5 of 63.2%, surpassing the baseline by 5.3 percentage points. Furthermore, BIMSA-YOLO also exhibits strong generalization capabilities on VisDrone and CARPK datasets. Our code and dataset are available at https://github.com/yikuizhai/BIMSA-YOLO
Bidirectional Interactive Multi-Scale Aggregation Network for Vehicle Detection in Urban Traffic / C. Dong, W. Qiu, Y. Li, Y. Zhai, X. Liu, C. Mai, H. Zhu, J. Zhou, P. Coscia, A. Genovese, C.L.P. Chen. - In: IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS. - ISSN 1524-9050. - (2025), pp. 1-21. [Epub ahead of print] [10.1109/tits.2025.3639483]
Bidirectional Interactive Multi-Scale Aggregation Network for Vehicle Detection in Urban Traffic
P. Coscia;A. Genovese;
2025
Abstract
Existing UAV vehicle-detection datasets, typically captured under static and uniform illumination, fail to adequately represent the variable lighting conditions, dense traffic, and frequent occlusions observed in real-world transportation hubs. To bridge this gap, a new dataset, UAV-HubSurveillance, is introduced to capture complex vehicle interactions across urban transportation nodes under diverse environmental scenarios. Although UAV-HubSurveillance provides rich and multidimensional interaction data, it still suffers from severe occlusions and adverse weather conditions that hinder detection and identification accuracy. To address these limitations, a novel vehicle detection framework, termed bidirectional interactive multi-scale aggregation-yolo (BIMSA-YOLO), is proposed, which integrates bidirectional feature interaction with adaptive multi-scale aggregation to enhance detection robustness. First, the bidirectional shallow fusion module (BSFM) facilitates cross-resolution information exchange through a lightweight gating strategy, preserving fine-grained details of small objects. Second, the interactive deep fusion module (IDFM) reinforces contextual coherence via attention-guided cross-level semantic fusion. Third, the multi-scale adaptive aggregation module (MSAAM) dynamically aligns and integrates multi-scale features to improve robustness against scale variation. Extensive experiments conducted on the UAV-HubSurveillance dataset demonstrate that BIMSA-YOLO significantly enhances detection performance under dynamic, occluded, and adverse-weather conditions. Specifically, the proposed model achieves an mAP0.5 of 63.2%, surpassing the baseline by 5.3 percentage points. Furthermore, BIMSA-YOLO also exhibits strong generalization capabilities on VisDrone and CARPK datasets. Our code and dataset are available at https://github.com/yikuizhai/BIMSA-YOLO| File | Dimensione | Formato | |
|---|---|---|---|
|
tits25.pdf
accesso riservato
Descrizione: online first
Tipologia:
Publisher's version/PDF
Licenza:
Nessuna licenza
Dimensione
37.4 MB
Formato
Adobe PDF
|
37.4 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




