Safetensors
qwen2

Can you release online RL prompts ?

#2
by sparsh35 - opened

As I read the tech report, you say that you used sampling accuracy criteria to find the hard or orthogonal dataset for online RL (GRPO) using Qwen 32B R1 , can you please release that dataset as well.
Thanks and Congrats for the release.
All the best.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment