R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning Paper • 2503.05379 • Published 7 days ago • 30
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering Paper • 2503.06492 • Published 5 days ago • 9
Gemini Embedding: Generalizable Embeddings from Gemini Paper • 2503.07891 • Published 4 days ago • 25
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL Paper • 2503.07536 • Published 4 days ago • 73
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published 4 days ago • 89
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Paper • 2503.06749 • Published 5 days ago • 21
Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning Paper • 2503.07002 • Published 4 days ago • 36
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning Paper • 2503.07365 • Published 4 days ago • 53
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published 7 days ago • 104