HF中国镜像站

qihoo360
/

Light-R1-14B-DS

Model card Files Files and versions Community

dyc-sh commited on 1 day ago

Commit

7b99dad

·

verified ·

1 Parent(s): d6ee6e3

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -16,11 +16,11 @@ base_model:
   <img width="80%" src="14b-rl.png">
 </p>
-[technical report](https://github.com/Qihoo360/Light-R1/blob/main/Light-R1.pdf)
-[GitHub page](https://github.com/Qihoo360/Light-R1)
-[wandb log](https://api.wandb.ai/links/seek4-nus/4klmwpqs)
 We introduce Light-R1-14B-DS, the first open-source successful RL attempt on already long-COT finetuned models of simialr sizes under **light** budget.
 Light-R1-14B-DS is also the State-Of-The-Art 14B math model with AIME24 & 25 scores 74.0 & 60.2, outperforming many 32B models.
@@ -32,7 +32,7 @@ We have finally seen expected behavior during RL training: simultaneous increase
 Originated from DeepSeek-R1-Distill-Qwen-14B, Light-R1-14B-DS underwent our long-COT RL Post-Training and achieved a new State-Of-The-Art across 14B-Math models: 74.0 & 60.2 on AIME 24 & 25 respectively.
 Light-R1-14B-DS also performed well on GPQA *without* any specific training.
-We are excited to release this model along with the [technical report](https://github.com/Qihoo360/Light-R1/blob/main/Light-R1.pdf), and will continue to perfect our long-COT RL Post-Training.
 ## Usage
 Same as DeepSeek-R1-Distill-Qwen-14B.

   <img width="80%" src="14b-rl.png">
 </p>
+[Technical Report](https://arxiv.org/pdf/2503.10460)
+[GitHub Page](https://github.com/Qihoo360/Light-R1)
+[Wandb Log](https://api.wandb.ai/links/seek4-nus/4klmwpqs)
 We introduce Light-R1-14B-DS, the first open-source successful RL attempt on already long-COT finetuned models of simialr sizes under **light** budget.
 Light-R1-14B-DS is also the State-Of-The-Art 14B math model with AIME24 & 25 scores 74.0 & 60.2, outperforming many 32B models.
 Originated from DeepSeek-R1-Distill-Qwen-14B, Light-R1-14B-DS underwent our long-COT RL Post-Training and achieved a new State-Of-The-Art across 14B-Math models: 74.0 & 60.2 on AIME 24 & 25 respectively.
 Light-R1-14B-DS also performed well on GPQA *without* any specific training.
+We are excited to release this model along with the [technical report](https://arxiv.org/pdf/2503.10460), and will continue to perfect our long-COT RL Post-Training.
 ## Usage
 Same as DeepSeek-R1-Distill-Qwen-14B.