Can you release online RL prompts ?

by sparsh35 - opened about 23 hours ago

about 23 hours ago

As I read the tech report, you say that you used sampling accuracy criteria to find the hard or orthogonal dataset for online RL (GRPO) using Qwen 32B R1 , can you please release that dataset as well.
Thanks and Congrats for the release.
All the best.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment