Pretrained Reward Model Classifier
Overview
This is a specialized binary classifier that evaluates text chunks and predicts whether they would be "Chosen" (A) or "Rejected" (B).
How It Works
- Text is split into exact 64-token chunks using the Qwen 2.5 tokenizer
- The model evaluates the preference between chunks in a specific format
- Only token IDs [32, 33] have non-zero weights in the LM head (A=Chosen, B=Rejected)
Input Format
The model expects input in this precise format:
[Original text from previous 64-token chunks]
<<JUDGEMENT_REGION>>
[Next 64-token chunk to evaluate]
<</JUDGEMENT_REGION>>
<<JUDGEMENT>>
Example
Original paragraph:
The city council meeting started promptly at 6 PM with all members present. Mayor Johnson opened by addressing concerns about the new parking regulations downtown. Citizens expressed both support and opposition during the public comment period. The council ultimately voted 4-2 to implement the regulations starting next month.
Formatted for prediction:
The city council meeting started promptly at 6 PM with all members present. Mayor Johnson opened by addressing concerns about the new parking regulations downtown.
<<JUDGEMENT_REGION>>
Citizens expressed both support and opposition during the public comment period. The council ultimately voted 4-2 to implement the regulations starting next month.
<</JUDGEMENT_REGION>>
<<JUDGEMENT>>
Output
The model predicts whether the chunk in the JUDGEMENT_REGION would be:
- A: Chosen (preferred content)
- B: Rejected (less preferred content)
The prediction is based on the relative probabilities between these two tokens only.
Analysis
For practical use, results should be aggregated by taking the mean of log ratios between the two probabilities:
log_ratio = log(P(A) / P(B))
This log ratio approach provides a more stable and interpretable signal across multiple evaluations than using raw probabilities alone.
- Downloads last month
- 3
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for Quest-AI/pretrain-rm-baseline-7b
Base model
Qwen/Qwen2.5-7B