Safetensors
qwen2

EurusPRM-Stage2

Links

Introduction

EurusPRM-Stage2 is trained using Implicit PRM, which obtains free process rewards at no additional cost but just needs to simply train an ORM on the cheaper response-level labels. During inference, implicit process rewards are obtained by forward passing and calculating the log-likelihood ratio on each step.

prm

The key ingredient of Implicit PRM is the reward representation, as demonstrated below:

Downloads last month
6,508
Safetensors
Model size
7.62B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for PRIME-RL/EurusPRM-Stage2

Quantizations
4 models