Submitted by akhaliq 44 NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? · 4 authors 3
Submitted by schrodingers-tiger 24 Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes · 6 authors 5
Submitted by wangyulong 18 Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning · 4 authors 4
Submitted by Lin-Chen 14 VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models · 12 authors 3
Submitted by akhaliq 12 DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation · 5 authors 2
Submitted by akhaliq 10 Animate3D: Animating Any 3D Model with Multi-view Video Diffusion · 6 authors 2
Submitted by davanstrien 9 FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models · 9 authors 2
Submitted by akhaliq 9 YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus · 2 authors 4
Submitted by akhaliq 8 From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients · 7 authors 2
Submitted by ChenMnZ 8 EfficientQAT: Efficient Quantization-Aware Training for Large Language Models · 8 authors 3
Submitted by ZehanWang 7 OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces · 8 authors 3
Submitted by jhauret 4 Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors · 7 authors 2
Submitted by yxdyc 4 Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development · 7 authors 2
Submitted by Mingyu111 1 Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models · 15 authors 2