Submitted by Elizaveta 55 When Less is Enough: Adaptive Token Reduction for Efficient Image Representation · 3 authors 2
Submitted by VentureZJ 45 MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving · 9 authors 2
Submitted by VentureZJ 38 MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization · 6 authors 2
Submitted by IranQin 33 RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints · 8 authors 2
Submitted by Epiphqny 28 Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation · 7 authors 4
Submitted by ydeng9 20 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement · 6 authors 2
Submitted by akhaliq 18 Modifying Large Language Model Post-Training for Diverse Creative Writing · 5 authors 2
Submitted by akhaliq 13 TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting · 7 authors 2
Submitted by JacobYuan 11 MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems · 8 authors 3
Submitted by akhaliq 8 FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models · 7 authors 3
Submitted by Guan123 8 ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering · 8 authors 2
Submitted by hitsmy 7 From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration · 4 authors 2
Submitted by ChengmingX 6 When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO · 8 authors 2
Submitted by ZhaochongAn 5 Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model · 7 authors 1