|
--- |
|
language: |
|
- en |
|
- fr |
|
- de |
|
- es |
|
- it |
|
- pt |
|
- zh |
|
- ja |
|
- ru |
|
- ko |
|
license: other |
|
license_name: mrl |
|
inference: false |
|
license_link: https://mistral.ai/licenses/MRL-0.1.md |
|
base_model: |
|
- anthracite-org/magnum-v4-123b |
|
--- |
|
|
|
# Magnum-v4-123b HQQ |
|
|
|
This repo contains magnum-v4-123b quantized to 4-bit precision using [HQQ](https://github.com/mobiusml/hqq/). |
|
|
|
HQQ provides a similar level of precision to AWQ at 4-bit, but with no need for calibration. |
|
|
|
This quant was generated using 8xA40s within only 10 minutes. |
|
|
|
```py |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, HqqConfig |
|
|
|
|
|
model_path = "anthracite-org/magnum-v4-123b" |
|
quant_config = HqqConfig(nbits=4, group_size=128, axis=1) |
|
|
|
model = AutoModelForCausalLM.from_pretrained(model_path, |
|
torch_dtype=torch.float16, |
|
cache_dir='.', |
|
device_map="cuda:0", |
|
quantization_config=quant_config, |
|
low_cpu_mem_usage=True) |
|
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) |
|
|
|
output_path = "magnum-v4-123b-hqq-4bit" |
|
model.save_pretrained(output_path) |
|
tokenizer.save_pretrained(output_path) |
|
``` |
|
|
|
## Inference |
|
|
|
You can perform inference directly with transformers, or using [aphrodite](https://github.com/PygmalionAI/aphrodite-engine): |
|
|
|
```sh |
|
pip install aphrodite-engine |
|
|
|
aphrodite run alpindale/magnum-v4-123b-hqq-4bit -tp 2 |
|
``` |