Weight output_partition_size = 576 is not divisible by weight quantization block_n = 128
Hello,I am using four A800 servers, each with 8x40G GPUs, to deploy meituan/DeepSeek-R1-Block-INT8.
The startup commands I used are:
python3 -m sglang.launch_server --model /mnt/oceanfs/DeepSeek-R1-Block-INT8 --tp 32 --dist-init-addr 10.0.0.3:5000 --nnodes 4 --node-rank 0 --trust-remote --host 0.0.0.0 --port 30000 --enable-torch-compile --torch-compile-max-bs 8
python3 -m sglang.launch_server --model /mnt/oceanfs/DeepSeek-R1-Block-INT8 --tp 32 --dist-init-addr 10.0.0.3:5000 --nnodes 4 --node-rank 1 --trust-remote --enable-torch-compile --torch-compile-max-bs 8
python3 -m sglang.launch_server --model /mnt/oceanfs/DeepSeek-R1-Block-INT8 --tp 32 --dist-init-addr 10.0.0.3:5000 --nnodes 4 --node-rank 2 --trust-remote --enable-torch-compile --torch-compile-max-bs 8
python3 -m sglang.launch_server --model /mnt/oceanfs/DeepSeek-R1-Block-INT8 --tp 32 --dist-init-addr 10.0.0.3:5000 --nnodes 4 --node-rank 3 --trust-remote --enable-torch-compile --torch-compile-max-bs 8
However, during startup, I encountered the following error:
"Weight output_partition_size = 576 is not divisible by weight quantization block_n = 128"
I saw someone mention that the quantization_config block in config.json can be removed, but Meituan requires this section to be present and it cannot be deleted.
Why is this happening, and how can I resolve it?