HF中国镜像站

meituan
/

DeepSeek-R1-Block-INT8

Text Generation

8-bit precision

Model card Files Files and versions Community

yuanzu commited on 16 days ago

Commit

752a83d

·

verified ·

1 Parent(s): 1842c56

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ The INT8 data type is both friendly and efficient for most hardware platforms.
 In benchmarking, we observe **no accuracy loss** and up to **33\%** performance enhancement.
-[SGLang](https://github.com/sgl-project/sglang/tree/main) will soon support the block-wise INT8 quantization operation once our [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730) is merged.
 ## 1. Benchmarking Result (detailed in [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730)):
 | Model  | Config | Accuracy (GSM8K) | Accuracy (MMLU) | Output Throughput(qps=128) |

 In benchmarking, we observe **no accuracy loss** and up to **33\%** performance enhancement.
+Thanks to our merged [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730), [SGLang](https://github.com/sgl-project/sglang/tree/main) is now support the block-wise INT8 quantization operation.
 ## 1. Benchmarking Result (detailed in [PULL REQUEST](https://github.com/sgl-project/sglang/pull/3730)):
 | Model  | Config | Accuracy (GSM8K) | Accuracy (MMLU) | Output Throughput(qps=128) |