Existing UAV vehicle-detection datasets, typically captured under static and uniform illumination, fail to adequately represent the variable lighting conditions, dense traffic, and frequent occlusions observed in real-world transportation hubs. To bridge this gap, a new dataset, UAV-HubSurveillance, is introduced to capture complex vehicle interactions across urban transportation nodes under diverse environmental scenarios. Although UAV-HubSurveillance provides rich and multidimensional interaction data, it still suffers from severe occlusions and adverse weather conditions that hinder detection and identification accuracy. To address these limitations, a novel vehicle detection framework, termed bidirectional interactive multi-scale aggregation-yolo (BIMSA-YOLO), is proposed, which integrates bidirectional feature interaction with adaptive multi-scale aggregation to enhance detection robustness. First, the bidirectional shallow fusion module (BSFM) facilitates cross-resolution information exchange through a lightweight gating strategy, preserving fine-grained details of small objects. Second, the interactive deep fusion module (IDFM) reinforces contextual coherence via attention-guided cross-level semantic fusion. Third, the multi-scale adaptive aggregation module (MSAAM) dynamically aligns and integrates multi-scale features to improve robustness against scale variation. Extensive experiments conducted on the UAV-HubSurveillance dataset demonstrate that BIMSA-YOLO significantly enhances detection performance under dynamic, occluded, and adverse-weather conditions. Specifically, the proposed model achieves an mAP0.5 of 63.2%, surpassing the baseline by 5.3 percentage points. Furthermore, BIMSA-YOLO also exhibits strong generalization capabilities on VisDrone and CARPK datasets. Our code and dataset are available at https://github.com/yikuizhai/BIMSA-YOLO

Bidirectional Interactive Multi-Scale Aggregation Network for Vehicle Detection in Urban Traffic / C. Dong, W. Qiu, Y. Li, Y. Zhai, X. Liu, C. Mai, H. Zhu, J. Zhou, P. Coscia, A. Genovese, C.L.P. Chen. - In: IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS. - ISSN 1524-9050. - (2025), pp. 1-21. [Epub ahead of print] [10.1109/tits.2025.3639483]

Bidirectional Interactive Multi-Scale Aggregation Network for Vehicle Detection in Urban Traffic

P. Coscia;A. Genovese;
2025

Abstract

Existing UAV vehicle-detection datasets, typically captured under static and uniform illumination, fail to adequately represent the variable lighting conditions, dense traffic, and frequent occlusions observed in real-world transportation hubs. To bridge this gap, a new dataset, UAV-HubSurveillance, is introduced to capture complex vehicle interactions across urban transportation nodes under diverse environmental scenarios. Although UAV-HubSurveillance provides rich and multidimensional interaction data, it still suffers from severe occlusions and adverse weather conditions that hinder detection and identification accuracy. To address these limitations, a novel vehicle detection framework, termed bidirectional interactive multi-scale aggregation-yolo (BIMSA-YOLO), is proposed, which integrates bidirectional feature interaction with adaptive multi-scale aggregation to enhance detection robustness. First, the bidirectional shallow fusion module (BSFM) facilitates cross-resolution information exchange through a lightweight gating strategy, preserving fine-grained details of small objects. Second, the interactive deep fusion module (IDFM) reinforces contextual coherence via attention-guided cross-level semantic fusion. Third, the multi-scale adaptive aggregation module (MSAAM) dynamically aligns and integrates multi-scale features to improve robustness against scale variation. Extensive experiments conducted on the UAV-HubSurveillance dataset demonstrate that BIMSA-YOLO significantly enhances detection performance under dynamic, occluded, and adverse-weather conditions. Specifically, the proposed model achieves an mAP0.5 of 63.2%, surpassing the baseline by 5.3 percentage points. Furthermore, BIMSA-YOLO also exhibits strong generalization capabilities on VisDrone and CARPK datasets. Our code and dataset are available at https://github.com/yikuizhai/BIMSA-YOLO
Vehicle detection; Feature extraction; Autonomous aerial vehicles; Vehicle dynamics; Semantics; Robustness; Real-time systems; Surveillance; Lighting; Accuracy
Settore INFO-01/A - Informatica
2025
24-dic-2025
Article (author)
File in questo prodotto:
File Dimensione Formato  
tits25.pdf

accesso riservato

Descrizione: online first
Tipologia: Publisher's version/PDF
Licenza: Nessuna licenza
Dimensione 37.4 MB
Formato Adobe PDF
37.4 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/1207255
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex 0
social impact