Availability to evaluate LLMs like in the HF blog post

#24
by sjrhuschlee - opened

Hello, I just saw thi blog post https://huggingface.co/blog/zero-shot-eval-on-the-hub which I am really excited about! I wanted to ask when the example in the blog post will be available to access in the HF leader boards?

Evaluation on the Hub org

Hi @sjrlee ! Great feature request idea - gently pinging @Tristan to add this task to the leaderboards so people can view the scores from the LLM evaluations on https://huggingface.co/spaces/autoevaluate/leaderboards?dataset=-any-

Evaluation on the Hub org

Hi @sjrlee you can find the evaluations coming from the zeroshot pipeline under the text-generation task on the leaderboards, e.g. https://huggingface.co/spaces/autoevaluate/leaderboards?dataset=mathemakitten%2Fwinobias_antistereotype_test

Feel free to close this issue if that addresses your query

Thanks, @lewtun ! That is great to see. Before I close this I wanted to ask if there are plans to add the results of all the model sizes shown in the blog post?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment