Update README.md
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ model-index:
|
|
27 |
|
28 |
## Model Summary
|
29 |
|
30 |
-
`phi-2-dpo` is an instruction-tuned model from an earlier SFT model [`phi-2-sft`](https://huggingface.co/lxuechen/phi-2-sft). Direct preference optimization (DPO) is used for fine-tuning on the [UltraFeedback dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
|
31 |
|
32 |
The purpose of the experiment is to understand the quality of the pre-trained Phi-2 model. The good news is that `phi-2-dpo` can follow open-ended user instructions well.
|
33 |
|
|
|
27 |
|
28 |
## Model Summary
|
29 |
|
30 |
+
`phi-2-dpo` is an instruction-tuned model from an earlier SFT model [`phi-2-sft`](https://huggingface.co/lxuechen/phi-2-sft). Direct preference optimization (DPO) is used for fine-tuning on a 10k subset of the [UltraFeedback dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized).
|
31 |
|
32 |
The purpose of the experiment is to understand the quality of the pre-trained Phi-2 model. The good news is that `phi-2-dpo` can follow open-ended user instructions well.
|
33 |
|