-
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 18 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 83 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 24 -
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
Paper • 2503.05379 • Published • 26
Collections
Discover the best community collections!
Collections including paper arxiv:2503.04625
-
On Domain-Specific Post-Training for Multimodal Large Language Models
Paper • 2411.19930 • Published • 27 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 83 -
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Paper • 2503.02495 • Published • 8 -
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective
Paper • 2503.01933 • Published • 11
-
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 42 -
SafeArena: Evaluating the Safety of Autonomous Web Agents
Paper • 2503.04957 • Published • 18 -
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper • 2503.04808 • Published • 16 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 83
-
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 83 -
Towards an AI co-scientist
Paper • 2502.18864 • Published • 42 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 68 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 179
-
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 83 -
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Paper • 2503.04724 • Published • 59 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 67 -
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper • 2503.07572 • Published • 26
-
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Paper • 2502.14768 • Published • 45 -
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Paper • 2502.12853 • Published • 28 -
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 17 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 46
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 56 -
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
Paper • 2502.07374 • Published • 36 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 142 -
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 60
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 25 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 26 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 108 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
Evolving Deeper LLM Thinking
Paper • 2501.09891 • Published • 106 -
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 80 -
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Paper • 2412.15084 • Published • 13 -
The Lessons of Developing Process Reward Models in Mathematical Reasoning
Paper • 2501.07301 • Published • 92
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 58 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 42 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 57