File size: 3,160 Bytes
40b2d72 38ec656 40b2d72 38ec656 40b2d72 38ec656 5044ea0 38ec656 9d90c75 6bc41d9 6072ce0 38ec656 9dcf1ce 38ec656 9d90c75 38ec656 585688f 38ec656 9dcf1ce 38ec656 7951b59 38ec656 9dcf1ce 38ec656 64d594a 7b81965 ce4c5de 38ec656 511017b 9d90c75 38ec656 87db4c3 38ec656 9d90c75 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
---
inference: false
license: llama2
model_creator: WizardLM
model_link: https://huggingface.co/WizardLM/WizardLM-70B-V1.0
model_name: WizardLM 70B V1.0
model_type: llama
quantized_by: Thireus
---
# WizardLM 70B V1.0 - EXL2
- Model creator: [WizardLM](https://huggingface.co/WizardLM)
- Original model: [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0)
- Quantized model: [WizardLM 70B V1.0-HF](https://huggingface.co/simsim314/WizardLM-70B-V1.0-HF) – float16 of WizardLM 70B V1.0
## Models available in this repository
| Branch | BITS (-b) | HEAD BITS (-hb) | MEASUREMENT LENGTH (-ml) | LENGTH (-l) | CAL DATASET (-c) | Size | ExLlama | Max Context Length |
| ------ | --------- | --------------- | ------------------------ | ----------- | ---------------- | ---- | ------- | ------------------ |
| [main](https://huggingface.co/Thireus/WizardLM-70B-V1.0-HF-4.0bpw-h6-exl2/tree/main) | 4.0 | 6 | 2048 | 2048 | [0000.parquet](https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-raw-v1/train) (wikitext-2-raw-v1) | 33GB | [V2](https://github.com/turboderp/exllamav2) | 4096 |
## Description:
This repository contains EXL2 model files for [WizardLM's WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0).
EXL2 is a new format used by ExLlamaV2 – https://github.com/turboderp/exllamav2. EXL2 is based on the same optimization method as GPTQ. The format allows for mixing quantization
levels within a model to achieve any average bitrate between 2 and 8 bits per weight.
## Prompt template (official):
```
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:
```
## Prompt template (suggested):
```
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
USER:
{prompt}
ASSISTANT:
```
## Quantization process:
| Original Model | → | Float16 Model | → | Safetensor Model | → | EXL2 Model |
| -------------- | --- | ------------- | --- | ---------------- | --- | ---------- |
| [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0) | → | [WizardLM 70B V1.0-HF](https://huggingface.co/simsim314/WizardLM-70B-V1.0-HF) | → | Safetensor* | → | EXL2 |
Example to convert WizardLM-70B-V1.0-HF_float16_safetensored to EXL2 4.0 bpw with 6-bit head:
```
mkdir -p ~/EXL2/WizardLM-70B-V1.0-HF_4bit # Create the output directory
python convert.py -i ~/float16_safetensored/WizardLM-70B-V1.0-HF -o ~/EXL2/WizardLM-70B-V1.0-HF_4bit -c ~/EXL2/0000.parquet -b 4.0 -hb 6
```
(*) Use any one of the following scripts to convert your float16 pytorch_model bin files to safetensors:
- https://github.com/turboderp/exllamav2/blob/master/util/convert_safetensors.py
- https://huggingface.co/Panchovix/airoboros-l2-70b-gpt4-1.4.1-safetensors/blob/main/bin2safetensors/convert.py
- https://gist.github.com/epicfilemcnulty/1f55fd96b08f8d4d6693293e37b4c55e
- https://github.com/oobabooga/text-generation-webui/blob/main/convert-to-safetensors.py |