Update README.md
Browse files
README.md
CHANGED
@@ -22,6 +22,23 @@ widget:
|
|
22 |
# bkai-foundation-models/vietnamese-bi-encoder
|
23 |
|
24 |
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
<!--- Describe your model here -->
|
27 |
|
|
|
22 |
# bkai-foundation-models/vietnamese-bi-encoder
|
23 |
|
24 |
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
25 |
+
We train the model on a merged training dataset that consists of:
|
26 |
+
- MS Macro (translated in Vietnamese)
|
27 |
+
- Squadv2 (translated in Vietnamese)
|
28 |
+
- 80% of the training set from the Legal Text Retrieval Zalo 2021 challenge
|
29 |
+
|
30 |
+
We use phobert-base-v2 as the pre-trained backbone.
|
31 |
+
|
32 |
+
Here are the results on the remaining 20% of the training set from the Legal Text Retrieval Zalo 2021 challenge:
|
33 |
+
|
34 |
+
| Pretrained Model | Trained Datasets | Acc@1 | Acc@10 | Acc@100 | Pre@10 | MRR@10 |
|
35 |
+
|-------------------------------|---------------------------------------|:------------:|:-------------:|:--------------:|:-------------:|:-------------:|
|
36 |
+
| [Vietnamese-SBERT](https://huggingface.co/keepitreal/vietnamese-sbert) | - | 32.34 | 52.97 | 89.84 | 7.05 | 45.30 |
|
37 |
+
| | MSMACRO | 54.06 | 84.69 | 93.75 | 8.33 | 64.56 |
|
38 |
+
| PhoBERT-base-v2 | MSMACRO | 47.81 | 77.19 | 92.34 | 7.72 | 58.37 |
|
39 |
+
| | MSMACRO + SQuADv2.0 + 80% Zalo | 73.28 | 93.59 | 98.85 | 9.36 | 80.73 |
|
40 |
+
|
41 |
+
![Uploading image.png…]()
|
42 |
|
43 |
<!--- Describe your model here -->
|
44 |
|