EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss Paper • 2402.05008 • Published Feb 7, 2024 • 22
PALO: A Polyglot Large Multimodal Model for 5B People Paper • 2402.14818 • Published Feb 22, 2024 • 24
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14, 2024 • 126
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9, 2024 • 30
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Paper • 2410.08159 • Published Oct 10, 2024 • 25
MangaNinja: Line Art Colorization with Precise Reference Following Paper • 2501.08332 • Published Jan 14 • 57
Diffusion Adversarial Post-Training for One-Step Video Generation Paper • 2501.08316 • Published Jan 14 • 33
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper • 2501.13926 • Published Jan 23 • 37
The Differences Between Direct Alignment Algorithms are a Blur Paper • 2502.01237 • Published Feb 3 • 112
Teaching Language Models to Critique via Reinforcement Learning Paper • 2502.03492 • Published Feb 5 • 24
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation Paper • 2502.14846 • Published 21 days ago • 13
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers Paper • 2502.14377 • Published 21 days ago • 12
KV-Edit: Training-Free Image Editing for Precise Background Preservation Paper • 2502.17363 • Published 17 days ago • 33
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper • 2502.20321 • Published 14 days ago • 29
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning Paper • 2502.19735 • Published 15 days ago • 7
On the Acquisition of Shared Grammatical Representations in Bilingual Language Models Paper • 2503.03962 • Published 8 days ago • 3
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model Paper • 2503.05132 • Published 7 days ago • 45
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published 3 days ago • 51
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer Paper • 2503.07027 • Published 4 days ago • 21
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement Paper • 2503.06520 • Published 5 days ago • 9
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories Paper • 2503.08625 • Published 2 days ago • 23
Gemini Embedding: Generalizable Embeddings from Gemini Paper • 2503.07891 • Published 3 days ago • 23