Trained for one epoch on ultrafeedback_binarized using cDPO. Evaluation pending.

Some initial benchmark results:

Task Version Metric Value Stderr
hellaswag 0 acc 0.6621 ± 0.0047
acc_norm 0.8525 ± 0.0035
arc_challenge 0 acc 0.6348 ± 0.0141
acc_norm 0.6698 ± 0.0137
winogrande 0 acc 0.7861 ± 0.0115
gsm8k 0 acc 0.5694 ± 0.0136
Downloads last month
1,895
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for chargoddard/loyal-piano-m7-cdpo

Merges
3 models
Quantizations
2 models

Dataset used to train chargoddard/loyal-piano-m7-cdpo

Spaces using chargoddard/loyal-piano-m7-cdpo 6

Collection including chargoddard/loyal-piano-m7-cdpo