Can you release online RL prompts ?
#2
by
sparsh35
- opened
As I read the tech report, you say that you used sampling accuracy criteria to find the hard or orthogonal dataset for online RL (GRPO) using Qwen 32B R1 , can you please release that dataset as well.
Thanks and Congrats for the release.
All the best.