Commit
·
22233c6
1
Parent(s):
d9eb9e6
Update README.md
Browse files
README.md
CHANGED
@@ -218,34 +218,6 @@ inference: false
|
|
218 |
Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
|
219 |
|
220 |
quantized version of [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)
|
221 |
-
```bash
|
222 |
-
pip install hf-hub-ctranslate2>=2.12.0 ctranslate2>=3.16.0
|
223 |
-
```
|
224 |
-
|
225 |
-
```python
|
226 |
-
# from transformers import AutoTokenizer
|
227 |
-
model_name = "michaelfeil/ct2fast-nllb-200-distilled-1.3B"
|
228 |
-
|
229 |
-
|
230 |
-
from hf_hub_ctranslate2 import TranslatorCT2fromHfHub
|
231 |
-
model = TranslatorCT2fromHfHub(
|
232 |
-
# load in int8 on CUDA
|
233 |
-
model_name_or_path=model_name,
|
234 |
-
device="cuda",
|
235 |
-
compute_type="int8_float16",
|
236 |
-
# tokenizer=AutoTokenizer.from_pretrained("{ORG}/{NAME}")
|
237 |
-
)
|
238 |
-
outputs = model.generate(
|
239 |
-
text=["def fibonnaci(", "User: How are you doing? Bot:"],
|
240 |
-
max_length=64,
|
241 |
-
)
|
242 |
-
print(outputs)
|
243 |
-
```
|
244 |
-
|
245 |
-
Checkpoint compatible to [ctranslate2>=3.16.0](https://github.com/OpenNMT/CTranslate2)
|
246 |
-
and [hf-hub-ctranslate2>=2.12.0](https://github.com/michaelfeil/hf-hub-ctranslate2)
|
247 |
-
- `compute_type=int8_float16` for `device="cuda"`
|
248 |
-
- `compute_type=int8` for `device="cpu"`
|
249 |
|
250 |
Converted on 2023-06-23 using
|
251 |
```
|
|
|
218 |
Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
|
219 |
|
220 |
quantized version of [facebook/nllb-200-distilled-1.3B](https://huggingface.co/facebook/nllb-200-distilled-1.3B)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
221 |
|
222 |
Converted on 2023-06-23 using
|
223 |
```
|