Submitted by Howuhh 87 XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning · 6 authors 1
Submitted by Royir 77 Make It Count: Text-to-Image Generation with an Accurate Number of Objects · 6 authors 3
Submitted by Cheng-YANG 55 ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation · 14 authors 2
Submitted by yurakuratov 49 BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack · 7 authors 4
Submitted by tellarin 32 SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages · 61 authors 1
Submitted by Weiyun1025 29 OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text · 40 authors 3
Submitted by wqshao126 25 GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices · 10 authors 1
Submitted by GlyphByT5 22 Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering · 6 authors 2
Submitted by YidaChen 11 Designing a Dashboard for Transparency and Control of Conversational AI · 12 authors 4
Submitted by KevinQHLin 9 VideoGUI: A Benchmark for GUI Automation from Instructional Videos · 8 authors 1
Submitted by wqshao126 9 Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality · 12 authors 1
Submitted by ahans1 8 Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs · 11 authors 1
Submitted by happy0612 7 AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis · 5 authors 1
Submitted by deeptimhe 6 GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors · 4 authors 2
Submitted by amanchadha 5 Decoding the Diversity: A Review of the Indic AI Research Landscape · 5 authors 1
Submitted by kargaranamir 5 MaskLID: Code-Switching Language Identification through Iterative Masking · 3 authors 1