Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Abstract
Temporal consistency is critical in video prediction to ensure that outputs are coherent and free of artifacts. Traditional methods, such as temporal attention and 3D convolution, may struggle with significant object motion and may not capture long-range temporal dependencies in dynamic scenes. To address this gap, we propose the Tracktention Layer, a novel architectural component that explicitly integrates motion information using point tracks, i.e., sequences of corresponding points across frames. By incorporating these motion cues, the Tracktention Layer enhances temporal alignment and effectively handles complex object motions, maintaining consistent feature representations over time. Our approach is computationally efficient and can be seamlessly integrated into existing models, such as Vision Transformers, with minimal modification. It can be used to upgrade image-only models to state-of-the-art video ones, sometimes outperforming models natively designed for video prediction. We demonstrate this on video depth prediction and video colorization, where models augmented with the Tracktention Layer exhibit significantly improved temporal consistency compared to baselines.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Stereo Any Video: Temporally Consistent Stereo Matching (2025)
- SplatVoxel: History-Aware Novel View Streaming without Temporal Training (2025)
- EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation (2025)
- Seeing World Dynamics in a Nutshell (2025)
- HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation (2025)
- Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer (2025)
- MITracker: Multi-View Integration for Visual Object Tracking (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on HF中国镜像站 checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper