Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper • 2502.14846 • Published 22 days ago • 13
Phantom: Subject-consistent video generation via cross-modal alignment Paper • 2502.11079 • Published 26 days ago • 52
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models Paper • 2502.06608 • Published Feb 10 • 33
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion Paper • 2502.08590 • Published 30 days ago • 40
Enhance-A-Video: Better Generated Video for Free Paper • 2502.07508 • Published about 1 month ago • 21
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 • 113
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper • 2502.01061 • Published Feb 3 • 189
MatAnyone: Stable Video Matting with Consistent Memory Propagation Paper • 2501.14677 • Published Jan 24 • 31
Tulu 3 Models Collection All models released with Tulu 3 -- state of the art open post-training recipes. • 11 items • Updated about 18 hours ago • 93
TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space Paper • 2501.12224 • Published Jan 21 • 46
Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise Paper • 2501.08331 • Published Jan 14 • 20
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos Paper • 2501.12375 • Published Jan 21 • 22
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published Jan 21 • 54
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments Paper • 2501.10893 • Published Jan 18 • 24
GameFactory: Creating New Games with Generative Interactive Videos Paper • 2501.08325 • Published Jan 14 • 64
CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation Paper • 2501.09433 • Published Jan 16 • 18