Scene geometry estimation from images plays a key role in robotics, augmented reality, and autonomous systems. In particular, Monocular Depth Estimation (MDE) focuses on predicting depth using a single RGB image, avoiding the need for expensive sensors. State-of-the-art approaches use deep learning models for MDE while processing images as a whole, sub-optimally exploiting their spatial information. A recent research direction focuses on smaller image patches, as depth information varies across different regions of an image. This approach reduces model complexity and improves performance by capturing finer spatial details. From this perspective, we propose a novel warp patch-based extraction method which corrects perspective camera distortions, and employ it in tailored training and inference pipelines. Our experimental results show that our patch-based approach outperforms its full-image-trained counterpart and the classical crop patch-based extraction. With our technique, we obtain a general performance enhancements over recent state-of-the-art models. Code will be available at https://github.com/AntonioFusillo/PatchMDE
On the relevance of patch-based extraction methods for monocular depth estimation / P. Coscia, A. Fusillo, A. Genovese, V. Piuri, F. Scotti. - In: IMAGE AND VISION COMPUTING. - ISSN 0262-8856. - (2025), pp. 105857.1-105857.34. [Epub ahead of print] [10.1016/j.imavis.2025.105857]
On the relevance of patch-based extraction methods for monocular depth estimation
P. CosciaPrimo
;A. FusilloSecondo
;A. Genovese;V. PiuriPenultimo
;F. ScottiUltimo
2025
Abstract
Scene geometry estimation from images plays a key role in robotics, augmented reality, and autonomous systems. In particular, Monocular Depth Estimation (MDE) focuses on predicting depth using a single RGB image, avoiding the need for expensive sensors. State-of-the-art approaches use deep learning models for MDE while processing images as a whole, sub-optimally exploiting their spatial information. A recent research direction focuses on smaller image patches, as depth information varies across different regions of an image. This approach reduces model complexity and improves performance by capturing finer spatial details. From this perspective, we propose a novel warp patch-based extraction method which corrects perspective camera distortions, and employ it in tailored training and inference pipelines. Our experimental results show that our patch-based approach outperforms its full-image-trained counterpart and the classical crop patch-based extraction. With our technique, we obtain a general performance enhancements over recent state-of-the-art models. Code will be available at https://github.com/AntonioFusillo/PatchMDE| File | Dimensione | Formato | |
|---|---|---|---|
|
imavis25b.pdf
accesso aperto
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Licenza:
Creative commons
Dimensione
10.61 MB
Formato
Adobe PDF
|
10.61 MB | Adobe PDF | Visualizza/Apri |
|
imavis25b_compressed.pdf
accesso aperto
Tipologia:
Post-print, accepted manuscript ecc. (versione accettata dall'editore)
Licenza:
Creative commons
Dimensione
4.96 MB
Formato
Adobe PDF
|
4.96 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.




