Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published 7 days ago • 104
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published 8 days ago • 60
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 8 days ago • 78
Mobius: Text to Seamless Looping Video Generation via Latent Shift Paper • 2502.20307 • Published 15 days ago • 17
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Paper • 2502.20126 • Published 15 days ago • 20
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper • 2502.20172 • Published 15 days ago • 27
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts Paper • 2502.20395 • Published 15 days ago • 44
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper • 2502.19634 • Published 15 days ago • 58
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization Paper • 2502.17540 • Published 18 days ago • 2
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs Paper • 2502.19413 • Published 16 days ago • 19
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? Paper • 2502.19361 • Published 16 days ago • 26
Language Models' Factuality Depends on the Language of Inquiry Paper • 2502.17955 • Published 17 days ago • 30
GHOST 2.0: generative high-fidelity one shot transfer of heads Paper • 2502.18417 • Published 17 days ago • 63
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding Paper • 2502.19400 • Published 16 days ago • 43