Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 3 days ago • 26
Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published 10 days ago • 26
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published 6 days ago • 71
SurveyX: Academic Survey Automation via Large Language Models Paper • 2502.14776 • Published 21 days ago • 92
MoBA: Mixture of Block Attention for Long-Context LLMs Paper • 2502.13189 • Published 23 days ago • 14
Running 2.24k 2.24k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Dria-Agent-a Collection powerful agentic models built for pythonic function calling • 4 items • Updated 27 days ago • 4
Tiny-Agent-a Collection fast and powerful agentic models designed to run on edge devices. • 6 items • Updated 29 days ago • 7
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 124 • 12
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 124 • 12
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 124
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning Paper • 2411.04983 • Published Nov 7, 2024 • 12
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 108
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 135
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training Paper • 2412.02030 • Published Dec 2, 2024 • 19