Haoxiang-Wang commited on
Commit
0a31e34
·
verified ·
1 Parent(s): d770fe5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -5,7 +5,7 @@ tags: []
5
 
6
  This is the SFT checkpoint used for the project [RLHFlow/Online-RLHF](https://github.com/RLHFlow/Online-RLHF)
7
 
8
- * **Technical Report**: [RLHF Workflow: From Reward Modeling to Online RLHF](https://arxiv.org/pdf/2405.07863)
9
  * **Authors**: Hanze Dong*, Wei Xiong*, Bo Pang*, Haoxiang Wang*, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, Tong Zhang
10
  * **Code**: https://github.com/RLHFlow/Online-RLHF
11
 
 
5
 
6
  This is the SFT checkpoint used for the project [RLHFlow/Online-RLHF](https://github.com/RLHFlow/Online-RLHF)
7
 
8
+ * **Paper**: [RLHF Workflow: From Reward Modeling to Online RLHF](https://arxiv.org/pdf/2405.07863) (Published in TMLR, 2024)
9
  * **Authors**: Hanze Dong*, Wei Xiong*, Bo Pang*, Haoxiang Wang*, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, Tong Zhang
10
  * **Code**: https://github.com/RLHFlow/Online-RLHF
11