ForeSight

Multi-View Streaming Joint Object Detection and Trajectory Forecasting

Accepted at ICCV 2025

Sandro Papais, Letian Wang, Brian Cheong, Steven L. Waslander

University of Toronto

Comparison of temporal perception approaches and where ForeSight fits

ForeSight unifies 3D object detection and motion forecasting in a single streaming framework. A shared, bidirectional memory lets forecasts inform detections and vice versa, improving accuracy and temporal consistency in complex scenes.

High-level overview of ForeSight's joint detection-forecasting with shared memory
A comparison of temporal learning approaches; ForeSight propagates motion forecasts forward for reuse in both detection and forecasting.
ForeSight pipeline: multi-view encoders, detection queries, forecast queries, and a joint streaming memory.
ForeSight pipeline: multi-view encoders, detection queries, forecast queries, and a joint streaming memory.

The transformer-based design eliminates explicit tracking, reducing error propagation. ForeSight streams queries across frames, using past detections and forecasts as priors to strengthen current predictions.

Qualitative result with pedestrians and vehicles—detections and forecasted trajectories
Qualitative result with occluded parked cars detected via temporal memory

On nuScenes, ForeSight sets a new bar for end-to-end prediction accuracy (EPA), outperforming UniAD by +9.3%, and improves detection with a +2.1% mAP gain over StreamPETR—while remaining efficient for streaming inference.

BibTeX

@inproceedings{papais2025foresight,
  author    = {Papais, Sandro and Wang, Letian and Cheong, Brian and Waslander, Steven L},
  title     = {ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month     = {October},
  year      = {2025}
}