Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ The INT8 data type is both friendly and efficient for most hardware platforms.
|
|
11 |
|
12 |
In benchmarking, we observe **no accuracy loss** and up to **33\%** performance enhancement.
|
13 |
|
14 |
-
[
|
15 |
|
16 |
## 1. Benchmarking Result (detailed in [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730)):
|
17 |
| Model | Config | Accuracy (GSM8K) | Accuracy (MMLU) | Output Throughput(qps=128) |
|
|
|
11 |
|
12 |
In benchmarking, we observe **no accuracy loss** and up to **33\%** performance enhancement.
|
13 |
|
14 |
+
Thanks to our merged [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730), [SGLang](https://github.com/sgl-project/sglang/tree/main) is now support the block-wise INT8 quantization operation.
|
15 |
|
16 |
## 1. Benchmarking Result (detailed in [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730)):
|
17 |
| Model | Config | Accuracy (GSM8K) | Accuracy (MMLU) | Output Throughput(qps=128) |
|