Requantized everything with new pre-tokenizer
Browse files- OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf +2 -2
- OpenCodeInterpreter-DS-6.7B.IQ2_M.gguf +2 -2
- OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf +2 -2
- OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf +2 -2
- OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf +2 -2
- OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf +2 -2
- OpenCodeInterpreter-DS-6.7B.IQ3_S.gguf +2 -2
- OpenCodeInterpreter-DS-6.7B.IQ3_XS.gguf +2 -2
- OpenCodeInterpreter-DS-6.7B.IQ3_XXS.gguf +2 -2
- OpenCodeInterpreter-DS-6.7B.IQ4_XS.gguf +2 -2
- OpenCodeInterpreter-DS-6.7B.imatrix.dat +2 -2
- README.md +5 -5
OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a09e844828446cf1c2f6fbfdda289b7c048f91f1bf64ed42486dc7ce74f00873
|
3 |
+
size 1530080384
|
OpenCodeInterpreter-DS-6.7B.IQ2_M.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7662673acdded25afa5846d17ea6391d8e0d8c57bd434e5ffa27a9761488140c
|
3 |
+
size 2361355392
|
OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:66e7d5bbe7eadc9166c5fd699082521e8a72b00ab12696912809583be567f321
|
3 |
+
size 2198170752
|
OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0a13b4cb2a81348911d6d9ca9a587e9ff7b08478974cb5e30502792ca6ce2a46
|
3 |
+
size 2036411520
|
OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d0660175e93185ca92b8e01a2cecc043f643cc23b4357306918be58948ba3ec4
|
3 |
+
size 1856449664
|
OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4482aa902c96eeaebbabc76f81129bc2e67284182a2073487429ed150a58f6c5
|
3 |
+
size 3116608640
|
OpenCodeInterpreter-DS-6.7B.IQ3_S.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e1c65eb086d34bafbe005a1d320d9709a0dd080af89edef574c074ef73d27be9
|
3 |
+
size 2950048896
|
OpenCodeInterpreter-DS-6.7B.IQ3_XS.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:af3525ae7e915200a857a1d6012eb63eb5ebde0c0fdb077a047c57cbccc3812c
|
3 |
+
size 2798267520
|
OpenCodeInterpreter-DS-6.7B.IQ3_XXS.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:22e420c184552cc7028b5547c436c138ca91415f75a229bc6f378232416ea980
|
3 |
+
size 2586995840
|
OpenCodeInterpreter-DS-6.7B.IQ4_XS.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:41bcc0e72a94b988446daab9daa6f8f6327065d5d51330ed91bc8d7b810c3c79
|
3 |
+
size 3621186688
|
OpenCodeInterpreter-DS-6.7B.imatrix.dat
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d799acf6c89364444c40223faadf8696a469ad80d6621e0327e2a84524f59c42
|
3 |
+
size 4562176
|
README.md
CHANGED
@@ -23,9 +23,9 @@ quantized_by: CISC
|
|
23 |
|
24 |
This repo contains State Of The Art quantized GGUF format model files for [OpenCodeInterpreter DS 6.7B](https://huggingface.co/m-a-p/OpenCodeInterpreter-DS-6.7B).
|
25 |
|
26 |
-
Quantization was done with an importance matrix that was trained for ~1M tokens (
|
27 |
|
28 |
-
|
29 |
|
30 |
<!-- description end -->
|
31 |
|
@@ -59,6 +59,7 @@ They are also compatible with many third party UIs and libraries provided they a
|
|
59 |
The new methods available are:
|
60 |
|
61 |
* GGML_TYPE_IQ1_S - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.56 bits per weight (bpw)
|
|
|
62 |
* GGML_TYPE_IQ2_XXS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.06 bpw
|
63 |
* GGML_TYPE_IQ2_XS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.31 bpw
|
64 |
* GGML_TYPE_IQ2_S - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.5 bpw
|
@@ -68,6 +69,7 @@ The new methods available are:
|
|
68 |
* GGML_TYPE_IQ3_S - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.44 bpw
|
69 |
* GGML_TYPE_IQ3_M - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.66 bpw
|
70 |
* GGML_TYPE_IQ4_XS - 4-bit quantization in super-blocks with an importance matrix applied, effectively using 4.25 bpw
|
|
|
71 |
|
72 |
Refer to the Provided Files table below to see what files use which methods, and how.
|
73 |
</details>
|
@@ -78,7 +80,7 @@ Refer to the Provided Files table below to see what files use which methods, and
|
|
78 |
|
79 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
80 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
81 |
-
| [OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf) | IQ1_S | 1 | 1.5 GB| 3.5 GB | smallest, significant quality loss -
|
82 |
| [OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf) | IQ2_XXS | 2 | 1.8 GB| 3.8 GB | very small, high quality loss |
|
83 |
| [OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf) | IQ2_XS | 2 | 1.9 GB| 3.9 GB | very small, high quality loss |
|
84 |
| [OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf) | IQ2_S | 2 | 2.1 GB| 4.1 GB | small, substantial quality loss |
|
@@ -91,8 +93,6 @@ Refer to the Provided Files table below to see what files use which methods, and
|
|
91 |
|
92 |
Generated importance matrix file: [OpenCodeInterpreter-DS-6.7B.imatrix.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix.dat)
|
93 |
|
94 |
-
Generated importance matrix file (4K context): [OpenCodeInterpreter-DS-6.7B.imatrix-4096.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix-4096.dat)
|
95 |
-
|
96 |
**Note**: the above RAM figures assume no GPU offloading with 4K context. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
|
97 |
|
98 |
<!-- README_GGUF.md-provided-files end -->
|
|
|
23 |
|
24 |
This repo contains State Of The Art quantized GGUF format model files for [OpenCodeInterpreter DS 6.7B](https://huggingface.co/m-a-p/OpenCodeInterpreter-DS-6.7B).
|
25 |
|
26 |
+
Quantization was done with an importance matrix that was trained for ~1M tokens (256 batches of 4096 tokens) of answers from the [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) dataset.
|
27 |
|
28 |
+
Everything has been reconverted and quantized with a new importance matrix using llama.cpp from April 29th 2024 onwards, as of commit [f4ab2a4](https://github.com/ggerganov/llama.cpp/commit/f4ab2a41476600a98067a9474ea8f9e6db41bcfa) to ensure correct pre-tokenization. The new GGUFs will work with older llama.cpp, but this may not generate correct prompt tokens, please use a recent build to ensure the best possible results!
|
29 |
|
30 |
<!-- description end -->
|
31 |
|
|
|
59 |
The new methods available are:
|
60 |
|
61 |
* GGML_TYPE_IQ1_S - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.56 bits per weight (bpw)
|
62 |
+
* GGML_TYPE_IQ1_M - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.75 bpw
|
63 |
* GGML_TYPE_IQ2_XXS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.06 bpw
|
64 |
* GGML_TYPE_IQ2_XS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.31 bpw
|
65 |
* GGML_TYPE_IQ2_S - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.5 bpw
|
|
|
69 |
* GGML_TYPE_IQ3_S - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.44 bpw
|
70 |
* GGML_TYPE_IQ3_M - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.66 bpw
|
71 |
* GGML_TYPE_IQ4_XS - 4-bit quantization in super-blocks with an importance matrix applied, effectively using 4.25 bpw
|
72 |
+
* GGML_TYPE_IQ4_NL - 4-bit non-linearly mapped quantization with an importance matrix applied, effectively using 4.5 bpw
|
73 |
|
74 |
Refer to the Provided Files table below to see what files use which methods, and how.
|
75 |
</details>
|
|
|
80 |
|
81 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
82 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
83 |
+
| [OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf) | IQ1_S | 1 | 1.5 GB| 3.5 GB | smallest, significant quality loss - **TBD**: Waiting for [this issue](https://github.com/ggerganov/llama.cpp/issues/5996) to be resolved |
|
84 |
| [OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf) | IQ2_XXS | 2 | 1.8 GB| 3.8 GB | very small, high quality loss |
|
85 |
| [OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf) | IQ2_XS | 2 | 1.9 GB| 3.9 GB | very small, high quality loss |
|
86 |
| [OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf) | IQ2_S | 2 | 2.1 GB| 4.1 GB | small, substantial quality loss |
|
|
|
93 |
|
94 |
Generated importance matrix file: [OpenCodeInterpreter-DS-6.7B.imatrix.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix.dat)
|
95 |
|
|
|
|
|
96 |
**Note**: the above RAM figures assume no GPU offloading with 4K context. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
|
97 |
|
98 |
<!-- README_GGUF.md-provided-files end -->
|