Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper
•
2503.07572
•
Published
•
31
None defined yet.
distilabel
, so we implemented PrometheusEval
.PrometheusEval
running their 7B variant with vLLM in a single L40 on top of
HuggingFaceH4/instruction-dataset, we got the 327 existing prompt-completion pairs evaluated and pushed to the Hub in less than 2 minutes!