Track, rank and evaluate open LLMs' CoT quality
Visualize model performance with interactive plots and tables