evaluation benchmark against gpt4o, gpt4o-mini, o1-preview, oi-mini

#18
by gweiz - opened

Anyone tried to benchmark against the latest foundation language model?
Is it still on par with the best? Any guide to go about to do this benchmarking?

Defog.ai org

Hi @gweiz , you can follow the instructions in our eval harness repo to evaluate on the latest models (e.g. openai):
https://github.com/defog-ai/sql-eval/?tab=readme-ov-file#openai

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment