Update README.md
Browse files
README.md
CHANGED
@@ -71,6 +71,9 @@ In this section, we report the evaluation results of SmolLM2. All evaluations ar
|
|
71 |
|
72 |
## Instruction model Vs. Humanized model
|
73 |
|
|
|
|
|
|
|
74 |
| Metric | SmolLM2-1.7B-Instruct | SmolLM2-1.7B-Humanized | Difference |
|
75 |
|:-----------------------------|:---------------------:|:----------------------:|:----------:|
|
76 |
| MMLU | **49.5** | 48.8 | -0.7 |
|
|
|
71 |
|
72 |
## Instruction model Vs. Humanized model
|
73 |
|
74 |
+
### Note
|
75 |
+
We observe an unexpectedly worse TriviaQA score compared to the base instruct model. A bit of training on a dataset such as squad-v2 quickly resolves this issue and just one epoch results in a TriviaQA score far above the base instruct model (>21). We did not release this model due to worse scores on different metrics after this one epoch training. If your specific use-case requires a better grasp of trivia, feel free to train on squad-v2.
|
76 |
+
|
77 |
| Metric | SmolLM2-1.7B-Instruct | SmolLM2-1.7B-Humanized | Difference |
|
78 |
|:-----------------------------|:---------------------:|:----------------------:|:----------:|
|
79 |
| MMLU | **49.5** | 48.8 | -0.7 |
|