Accurate prediction of future human positions is an essential task for modern video-surveillance systems. Current state-of-the-art models usually rely on a 'history' of past tracked locations (e.g., 3 to 5 seconds) to predict a plausible sequence of future locations (e.g., up to the next 5 seconds). We feel that this common schema neglects critical traits of realistic applications: as the collection of input trajectories involves machine perception (i.e., detection and tracking), incorrect detection and fragmentation errors may accumulate in crowded scenes, leading to tracking drifts. On this account, the model would be fed with corrupted and noisy input data, thus fatally affecting its prediction performance. In this regard, we focus on delivering accurate predictions when only few input observations are used, thus potentially lowering the risks associated with automatic perception. To this end, we conceive a novel distillation strategy that allows a knowledge transfer from a teacher network to a student one, the latter fed with fewer observations (just two ones). We show that a properly defined teacher super-vision allows a student network to perform comparably to state-of-the-art approaches that demand more observations. Besides, extensive experiments on common trajectory forecasting datasets highlight that our student network better generalizes to unseen scenarios.

How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting / A. Monti, A. Porrello, S. Calderara, P. Coscia, L. Ballan, R. Cucchiara (PROCEEDINGS IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION). - In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)[s.l] : IEEE, 2022. - ISBN 978-1-6654-6946-3. - pp. 6543-6552 (( convegno Conference on Computer Vision and Pattern Recognition (CVPR) tenutosi a New Orleans nel 2022 [10.1109/CVPR52688.2022.00644].

How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting

P. Coscia;
2022

Abstract

Accurate prediction of future human positions is an essential task for modern video-surveillance systems. Current state-of-the-art models usually rely on a 'history' of past tracked locations (e.g., 3 to 5 seconds) to predict a plausible sequence of future locations (e.g., up to the next 5 seconds). We feel that this common schema neglects critical traits of realistic applications: as the collection of input trajectories involves machine perception (i.e., detection and tracking), incorrect detection and fragmentation errors may accumulate in crowded scenes, leading to tracking drifts. On this account, the model would be fed with corrupted and noisy input data, thus fatally affecting its prediction performance. In this regard, we focus on delivering accurate predictions when only few input observations are used, thus potentially lowering the risks associated with automatic perception. To this end, we conceive a novel distillation strategy that allows a knowledge transfer from a teacher network to a student one, the latter fed with fewer observations (just two ones). We show that a properly defined teacher super-vision allows a student network to perform comparably to state-of-the-art approaches that demand more observations. Besides, extensive experiments on common trajectory forecasting datasets highlight that our student network better generalizes to unseen scenarios.
Action and event recognition; Motion and tracking; Navigation and autonomous driving; Robot vision
Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
2022
Book Part (author)
File in questo prodotto:
File Dimensione Formato  
How_many_Observations_are_Enough_Knowledge_Distillation_for_Trajectory_Forecasting.pdf

accesso riservato

Tipologia: Publisher's version/PDF
Dimensione 1.29 MB
Formato Adobe PDF
1.29 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2434/952719
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 61
  • ???jsp.display-item.citation.isi??? 34
  • OpenAlex ND
social impact