Merge branch 'main' of https://huggingface.co/bertin-project/bertin-roberta-base-spanish into main
Browse files
README.md
CHANGED
@@ -10,8 +10,8 @@ widget:
|
|
10 |
---
|
11 |
|
12 |
- [Version beta](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/beta): July 15th, 2021
|
13 |
-
- [Version
|
14 |
-
|
15 |
|
16 |
# BERTIN
|
17 |
|
@@ -252,7 +252,7 @@ In addition to the tasks above, we also trained the [`beta`](https://huggingface
|
|
252 |
|
253 |
Results for PAWS-X seem surprising given the large differences in performance. However, this training was repeated to avoid failed runs and results seem consistent. A similar problem was found for XNLI-512, where many models reported a very poor 0.3333 accuracy on a first run (and even a second, in the case of BSC-BNE). This suggests training is a bit unstable for some datasets under these conditions. Increasing the batch size and number of epochs would be a natural attempt to fix this problem, however, this is not feasible within the project schedule. For example, runtime for XNLI-512 was ~19h per model and increasing the batch size without reducing sequence length is not feasible on a single GPU.
|
254 |
|
255 |
-
We are also releasing the fine-tuned models for `Gaussian`-512 and making it our version
|
256 |
|
257 |
- POS: [`bertin-project/bertin-base-pos-conll2002-es`](https://huggingface.co/bertin-project/bertin-base-pos-conll2002-es/)
|
258 |
- NER: [`bertin-project/bertin-base-ner-conll2002-es`](https://huggingface.co/bertin-project/bertin-base-ner-conll2002-es/)
|
|
|
10 |
---
|
11 |
|
12 |
- [Version beta](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/beta): July 15th, 2021
|
13 |
+
- [Version v1](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/v1): July 26th, 2021
|
14 |
+
- [Version v1-512](https://huggingface.co/bertin-project/bertin-roberta-base-spanish/tree/v1-512): July 26th, 2021
|
15 |
|
16 |
# BERTIN
|
17 |
|
|
|
252 |
|
253 |
Results for PAWS-X seem surprising given the large differences in performance. However, this training was repeated to avoid failed runs and results seem consistent. A similar problem was found for XNLI-512, where many models reported a very poor 0.3333 accuracy on a first run (and even a second, in the case of BSC-BNE). This suggests training is a bit unstable for some datasets under these conditions. Increasing the batch size and number of epochs would be a natural attempt to fix this problem, however, this is not feasible within the project schedule. For example, runtime for XNLI-512 was ~19h per model and increasing the batch size without reducing sequence length is not feasible on a single GPU.
|
254 |
|
255 |
+
We are also releasing the fine-tuned models for `Gaussian`-512 and making it our version v1 default to 128 sequence length since it experimentally shows better performance on fill-mask task, while alse releasing the 512 sequence length version (v1-512) for fine-tuning.
|
256 |
|
257 |
- POS: [`bertin-project/bertin-base-pos-conll2002-es`](https://huggingface.co/bertin-project/bertin-base-pos-conll2002-es/)
|
258 |
- NER: [`bertin-project/bertin-base-ner-conll2002-es`](https://huggingface.co/bertin-project/bertin-base-ner-conll2002-es/)
|