Collections
Discover the best community collections!
Collections including paper arxiv:2503.16219
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 28 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 40 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 53 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 27 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 43 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 22
-
Natural Language Reinforcement Learning
Paper • 2411.14251 • Published • 30 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 28 -
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Paper • 2503.16219 • Published • 43 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 48
-
RuCCoD: Towards Automated ICD Coding in Russian
Paper • 2502.21263 • Published • 126 -
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 110 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 43 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 25
-
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 19 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 98 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 25 -
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning
Paper • 2503.05379 • Published • 33
-
Competitive Programming with Large Reasoning Models
Paper • 2502.06807 • Published • 67 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 147 -
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
Paper • 2502.06781 • Published • 60 -
LIMO: Less is More for Reasoning
Paper • 2502.03387 • Published • 60
-
OpenAI o1 System Card
Paper • 2412.16720 • Published • 33 -
LearnLM: Improving Gemini for Learning
Paper • 2412.16429 • Published • 22 -
NILE: Internal Consistency Alignment in Large Language Models
Paper • 2412.16686 • Published • 8 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38