File size: 3,160 Bytes

40b2d72
38ec656
40b2d72
38ec656
 
 
 
 
40b2d72
38ec656
 
 
 
5044ea0
38ec656
9d90c75
 
6bc41d9
6072ce0
 
38ec656
9dcf1ce
38ec656
 
 
9d90c75
38ec656
 
585688f
38ec656
 
 
 
 
9dcf1ce
38ec656
 
 
 
 
 
 
7951b59
38ec656
 
9dcf1ce
38ec656
64d594a
7b81965
ce4c5de
38ec656
511017b
9d90c75
38ec656
 
87db4c3
38ec656
9d90c75

---
inference: false
license: llama2
model_creator: WizardLM
model_link: https://huggingface.co/WizardLM/WizardLM-70B-V1.0
model_name: WizardLM 70B V1.0
model_type: llama
quantized_by: Thireus
---

# WizardLM 70B V1.0 - EXL2
- Model creator: [WizardLM](https://huggingface.co/WizardLM)
- Original model: [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0)
- Quantized model: [WizardLM 70B V1.0-HF](https://huggingface.co/simsim314/WizardLM-70B-V1.0-HF) – float16 of WizardLM 70B V1.0

## Models available in this repository

| Branch | BITS (-b) | HEAD BITS (-hb) | MEASUREMENT LENGTH (-ml) | LENGTH (-l) | CAL DATASET (-c) | Size | ExLlama | Max Context Length |
| ------ | --------- | --------------- | ------------------------ | ----------- | ---------------- | ---- | ------- | ------------------ |
| [main](https://huggingface.co/Thireus/WizardLM-70B-V1.0-HF-4.0bpw-h6-exl2/tree/main) | 4.0 | 6 | 2048 | 2048 | [0000.parquet](https://huggingface.co/datasets/wikitext/tree/refs%2Fconvert%2Fparquet/wikitext-2-raw-v1/train) (wikitext-2-raw-v1) | 33GB | [V2](https://github.com/turboderp/exllamav2) | 4096 | 

## Description:

This repository contains EXL2 model files for [WizardLM's WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0).

EXL2 is a new format used by ExLlamaV2 – https://github.com/turboderp/exllamav2. EXL2 is based on the same optimization method as GPTQ. The format allows for mixing quantization
levels within a model to achieve any average bitrate between 2 and 8 bits per weight.

## Prompt template (official):

```
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:
```

## Prompt template (suggested):

```
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
USER:
{prompt}
ASSISTANT:


```

## Quantization process:

| Original Model | → | Float16 Model | → | Safetensor Model | → | EXL2 Model |
| -------------- | --- | ------------- | --- | ---------------- | --- | ---------- |
| [WizardLM 70B V1.0](https://huggingface.co/WizardLM/WizardLM-70B-V1.0) | → | [WizardLM 70B V1.0-HF](https://huggingface.co/simsim314/WizardLM-70B-V1.0-HF) | → | Safetensor* | → | EXL2 |

Example to convert WizardLM-70B-V1.0-HF_float16_safetensored to EXL2 4.0 bpw with 6-bit head:

```
mkdir -p ~/EXL2/WizardLM-70B-V1.0-HF_4bit # Create the output directory
python convert.py -i ~/float16_safetensored/WizardLM-70B-V1.0-HF -o ~/EXL2/WizardLM-70B-V1.0-HF_4bit -c ~/EXL2/0000.parquet -b 4.0 -hb 6
```

(*) Use any one of the following scripts to convert your float16 pytorch_model bin files to safetensors:

- https://github.com/turboderp/exllamav2/blob/master/util/convert_safetensors.py
- https://huggingface.co/Panchovix/airoboros-l2-70b-gpt4-1.4.1-safetensors/blob/main/bin2safetensors/convert.py
- https://gist.github.com/epicfilemcnulty/1f55fd96b08f8d4d6693293e37b4c55e
- https://github.com/oobabooga/text-generation-webui/blob/main/convert-to-safetensors.py