1123 52 184

Clémentine Fourrier

clefourrier

http://clefourrier.github.io

AI & ML interests

None yet

Recent Activity

updated a dataset about 2 hours ago

gaia-benchmark/results_public

updated a dataset about 2 hours ago

gaia-benchmark/submissions_public

new activity about 22 hours ago

gaia-benchmark/leaderboard:Space is failing to load

View all activity

Organizations

Posts 17

Post

1063

Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.

Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)

For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)

Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!

Because if your model knows its evals by heart, you're not testing for generalization.

View all Posts

Articles 34

Article

Fixing Open LLM Leaderboard with Math-Verify

View all Articles

Collections 2

models 2

clefourrier/graphormer-base-pcqm4mv1

Graph Machine Learning • Updated Feb 7, 2023 • 86 • 5

clefourrier/graphormer-base-pcqm4mv2

Graph Machine Learning • Updated Feb 7, 2023 • 983 • 69

datasets

None public yet

Clémentine Fourrier

AI & ML interests

Recent Activity

Organizations

Posts 17

Articles 34

Fixing Open LLM Leaderboard with Math-Verify

Collections 2

Open LLM Leaderboard

Big Code Models Leaderboard

Chatbot Arena Leaderboard

LLM-Perf Leaderboard

facebook/anli

codeparrot/apps

allenai/ai2_arc

EleutherAI/asdiv

Papers 8

spaces 1

Backend

models 2

clefourrier/graphormer-base-pcqm4mv1

clefourrier/graphormer-base-pcqm4mv2

datasets

Clémentine Fourrier

AI & ML interests

Recent Activity

Organizations

Posts 17

Articles 34

Fixing Open LLM Leaderboard with Math-Verify

Collections 2

Open LLM Leaderboard

Big Code Models Leaderboard

Chatbot Arena Leaderboard

LLM-Perf Leaderboard

Papers 8

spaces 1

Backend

models 2 Sort: Recently updated

datasets

models 2