@m-ric on HF中国镜像站: "Our new Agentic leaderboard is now live!💥 If you ever asked which LLM is…"

HF中国镜像站

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

m-ric

posted an update 3 days ago

Post

777

Our new Agentic leaderboard is now live!💥

If you ever asked which LLM is best for powering agents, we've just made a leaderboard that ranks them all! Built with @albertvillanova , this ranks LLMs powering a smolagents CodeAgent on subsets of various benchmarks. ✅

🏆 GPT-4.5 comes on top, even beating reasoning models like DeepSeek-R1 or o1. And Claude-3.7-Sonnet is a close second!

The leaderboard also allows you to show the scores of vanilla LLMs (without any agentic setup) on the same benchmarks: this shows the huge improvements brought by agentic setups. 💪

(Note that results will be added manually, so the leaderboard might not always have the latest LLMs)

DataSoul

3 days ago

For such immediate response tasks, models that require longer thinking processes don't seem to have an advantage. Perhaps more different series of general-purpose models should be added?

In this post