Submitted by yuchenlin 24 The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism · 4 authors 4
Submitted by shumingma 23 Q-Sparse: All Large Language Models can be Fully Sparsely-Activated · 4 authors 3
Submitted by tuvu 15 Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation · 6 authors 8
Submitted by cheryyunl 11 Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion · 6 authors 2
Submitted by akhaliq 8 Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity · 4 authors 2
Submitted by rshcao 7 Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? · 23 authors 2
Submitted by akhaliq 6 LAB-Bench: Measuring Capabilities of Language Models for Biology Research · 9 authors 2
Submitted by Paranioar 6 SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning · 7 authors 2
Submitted by davanstrien 5 MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models · 11 authors 2
Submitted by akhaliq 5 Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models · 7 authors 2