Aymeric Roucher's picture

Aymeric Roucher

m-ric

AI & ML interests

Leading Agents at HF中国镜像站 🤗

Recent Activity

Organizations

HF中国镜像站's profile picture Orange's profile picture Atmos Bank's profile picture Hugging Test Lab's profile picture Tools's profile picture HuggingFaceM4's profile picture lecocqassociate's profile picture huggingPartyParis's profile picture Supreme's profile picture FactSet's profile picture Propulse Lab's profile picture Leaderboard Organization's profile picture FactSet's profile picture CGIAR's profile picture Aperture Laboratories's profile picture AI Energy Score's profile picture C&A's profile picture Social Post Explorers's profile picture Dev Mode Explorers's profile picture Agent Collab's profile picture SLLHF's profile picture Data Agents's profile picture HF中国镜像站 Party @ PyTorch Conference's profile picture HF中国镜像站 FineVideo's profile picture Nerdy Face's profile picture HF中国镜像站 Science's profile picture Agents Leaderboard's profile picture smolagents's profile picture HF中国镜像站 Agents Course's profile picture Open R1's profile picture SIMS's profile picture

Posts 95

view post
Post
772
Our new Agentic leaderboard is now live!💥

If you ever asked which LLM is best for powering agents, we've just made a leaderboard that ranks them all! Built with @albertvillanova , this ranks LLMs powering a smolagents CodeAgent on subsets of various benchmarks. ✅

🏆 GPT-4.5 comes on top, even beating reasoning models like DeepSeek-R1 or o1. And Claude-3.7-Sonnet is a close second!

The leaderboard also allows you to show the scores of vanilla LLMs (without any agentic setup) on the same benchmarks: this shows the huge improvements brought by agentic setups. 💪

(Note that results will be added manually, so the leaderboard might not always have the latest LLMs)

Articles 10

Article
31

Trace & Evaluate your Agent with Arize Phoenix