Multi-Turn Code Generation Through Single-Step Rewards Paper • 2502.20380 • Published 15 days ago • 30
Simple Guidance Mechanisms for Discrete Diffusion Models Paper • 2412.10193 • Published Dec 13, 2024 • 1
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 203
SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation Paper • 2412.13649 • Published Dec 18, 2024 • 20
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up Paper • 2412.16112 • Published Dec 20, 2024 • 23
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published Dec 20, 2024 • 38
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning Paper • 2412.16849 • Published Dec 22, 2024 • 9
NILE: Internal Consistency Alignment in Large Language Models Paper • 2412.16686 • Published Dec 21, 2024 • 8
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding Paper • 2412.17295 • Published Dec 23, 2024 • 9
Agent-SafetyBench: Evaluating the Safety of LLM Agents Paper • 2412.14470 • Published Dec 19, 2024 • 12
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World Paper • 2412.17589 • Published Dec 23, 2024 • 12
Outcome-Refining Process Supervision for Code Generation Paper • 2412.15118 • Published Dec 19, 2024 • 19
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought Paper • 2412.17498 • Published Dec 23, 2024 • 22