Safetensors
qwen2
dyc-sh commited on
Commit
7b99dad
·
verified ·
1 Parent(s): d6ee6e3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -16,11 +16,11 @@ base_model:
16
  <img width="80%" src="14b-rl.png">
17
  </p>
18
 
19
- [technical report](https://github.com/Qihoo360/Light-R1/blob/main/Light-R1.pdf)
20
 
21
- [GitHub page](https://github.com/Qihoo360/Light-R1)
22
 
23
- [wandb log](https://api.wandb.ai/links/seek4-nus/4klmwpqs)
24
 
25
  We introduce Light-R1-14B-DS, the first open-source successful RL attempt on already long-COT finetuned models of simialr sizes under **light** budget.
26
  Light-R1-14B-DS is also the State-Of-The-Art 14B math model with AIME24 & 25 scores 74.0 & 60.2, outperforming many 32B models.
@@ -32,7 +32,7 @@ We have finally seen expected behavior during RL training: simultaneous increase
32
 
33
  Originated from DeepSeek-R1-Distill-Qwen-14B, Light-R1-14B-DS underwent our long-COT RL Post-Training and achieved a new State-Of-The-Art across 14B-Math models: 74.0 & 60.2 on AIME 24 & 25 respectively.
34
  Light-R1-14B-DS also performed well on GPQA *without* any specific training.
35
- We are excited to release this model along with the [technical report](https://github.com/Qihoo360/Light-R1/blob/main/Light-R1.pdf), and will continue to perfect our long-COT RL Post-Training.
36
 
37
  ## Usage
38
  Same as DeepSeek-R1-Distill-Qwen-14B.
 
16
  <img width="80%" src="14b-rl.png">
17
  </p>
18
 
19
+ [Technical Report](https://arxiv.org/pdf/2503.10460)
20
 
21
+ [GitHub Page](https://github.com/Qihoo360/Light-R1)
22
 
23
+ [Wandb Log](https://api.wandb.ai/links/seek4-nus/4klmwpqs)
24
 
25
  We introduce Light-R1-14B-DS, the first open-source successful RL attempt on already long-COT finetuned models of simialr sizes under **light** budget.
26
  Light-R1-14B-DS is also the State-Of-The-Art 14B math model with AIME24 & 25 scores 74.0 & 60.2, outperforming many 32B models.
 
32
 
33
  Originated from DeepSeek-R1-Distill-Qwen-14B, Light-R1-14B-DS underwent our long-COT RL Post-Training and achieved a new State-Of-The-Art across 14B-Math models: 74.0 & 60.2 on AIME 24 & 25 respectively.
34
  Light-R1-14B-DS also performed well on GPQA *without* any specific training.
35
+ We are excited to release this model along with the [technical report](https://arxiv.org/pdf/2503.10460), and will continue to perfect our long-COT RL Post-Training.
36
 
37
  ## Usage
38
  Same as DeepSeek-R1-Distill-Qwen-14B.