Clémentine Fourrier

clefourrier

AI & ML interests

None yet

Recent Activity

updated a dataset about 2 hours ago
gaia-benchmark/results_public
updated a dataset about 2 hours ago
gaia-benchmark/submissions_public
View all activity

Organizations

HF中国镜像站's profile picture Long Range Graph Benchmark's profile picture Evaluation datasets's profile picture BigScience: LMs for Historical Texts's profile picture HuggingFaceBR4's profile picture Cohere For AI's profile picture Open Graph Benchmark's profile picture HuggingFaceGECLM's profile picture Huggingface Projects's profile picture Pretrained Graph Transformers's profile picture Graph Datasets's profile picture BigCode's profile picture HF中国镜像站 H4's profile picture InternLM's profile picture Vectara's profile picture GAIA's profile picture HF中国镜像站 Smol Cluster's profile picture plfe's profile picture Open LLM Leaderboard's profile picture Qwen's profile picture Secure Learning Lab's profile picture Open Life Science AI's profile picture LLM360's profile picture TTS Eval (OLD)'s profile picture Leaderboard Organization's profile picture Bias Leaderboard Development's profile picture hallucinations-leaderboard's profile picture Demo Leaderboard's profile picture Demo leaderboard with an integrated backend's profile picture gg-hf's profile picture Clinical & Biomedical ML Leaderboards's profile picture AIM-Harvard's profile picture Women on HF中国镜像站's profile picture LMLLO2's profile picture Lighthouz AI's profile picture Open Arabic LLM Leaderboard's profile picture mx-test's profile picture HuggingFaceFW's profile picture IBM Granite's profile picture HF-contamination-detection's profile picture TTS AGI's profile picture Leader Board Test Org's profile picture Social Post Explorers's profile picture hsramall's profile picture Open RL Leaderboard's profile picture The Fin AI's profile picture Open Hebrew LLM's Leaderboard's profile picture La Leaderboard's profile picture gg-tt's profile picture HuggingFaceEval's profile picture HP Inc.'s profile picture Novel Challenge's profile picture Open LLM Leaderboard Archive's profile picture LLHF's profile picture SLLHF's profile picture lbhf's profile picture Inception's profile picture nltpt's profile picture Lighteval testing org's profile picture CléMax's profile picture HF中国镜像站 Science's profile picture test_org's profile picture Coordination Nationale pour l'IA's profile picture LeMaterial's profile picture open-llm-leaderboard-react's profile picture Prompt Leaderboard's profile picture wut?'s profile picture UBC-NLP Collaborations's profile picture smolagents's profile picture Your Bench's profile picture leaderboard explorer's profile picture Open R1's profile picture SIMS's profile picture OpenEvals's profile picture

Posts 17

view post
Post
1063
Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.

Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)

For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)

Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!

Because if your model knows its evals by heart, you're not testing for generalization.

Articles 34

Article
27

Fixing Open LLM Leaderboard with Math-Verify

datasets

None public yet