ForeSight: Multi-View Streaming Joint Object Detection and Trajectory Forecasting

Comparison of temporal perception approaches and where ForeSight fits

ForeSight unifies 3D object detection and motion forecasting in a single streaming framework. A shared, bidirectional memory lets forecasts inform detections and vice versa, improving accuracy and temporal consistency in complex scenes.

High-level overview of ForeSight's joint detection-forecasting with shared memory — A comparison of temporal learning approaches; ForeSight propagates motion forecasts forward for reuse in both detection and forecasting.

ForeSight pipeline: multi-view encoders, detection queries, forecast queries, and a joint streaming memory.

The transformer-based design eliminates explicit tracking, reducing error propagation. ForeSight streams queries across frames, using past detections and forecasts as priors to strengthen current predictions.

Qualitative result with pedestrians and vehicles—detections and forecasted trajectories

Qualitative result with occluded parked cars detected via temporal memory

On nuScenes, ForeSight sets a new bar for end-to-end prediction accuracy (EPA), outperforming UniAD by +9.3%, and improves detection with a +2.1% mAP gain over StreamPETR—while remaining efficient for streaming inference.

ForeSight

Multi-View Streaming Joint Object Detection and Trajectory Forecasting

BibTeX