SELM-Phi
Collection
4 items
•
Updated
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment.
This model is a fine-tuned version of ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-2 using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.
AlpacaEval 2.0 (LC WR) | MT-Bench (Average) | |
---|---|---|
SELM-Phi-3-mini-4k-instruct-iter-3 | 27.98 | 8.32 |
SELM-Phi-3-mini-4k-instruct-iter-2 | 26.79 | 8.44 |
SELM-Phi-3-mini-4k-instruct-iter-1 | 27.33 | 8.37 |
Phi-3-mini-4k-instruct | 23.05 | 8.12 |
Our model also ranks highly on WildBench! 🔥
The following hyperparameters were used during training:
Base model
microsoft/Phi-3-mini-4k-instruct