AMD-Hummingbird: Towards an Efficient Text-to-Video Model Paper • 2503.18559 • Published 2 days ago • 4
Cosmos Transfer1 Collection World Foundation Model for Domain Transfer • 5 items • Updated 6 days ago • 11
FFN Fusion: Rethinking Sequential Computation in Large Language Models Paper • 2503.18908 • Published 1 day ago • 14
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning Paper • 2503.18013 • Published 3 days ago • 15
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering Paper • 2503.16422 • Published 6 days ago • 14
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published 6 days ago • 33