Pixtral-12B-2409: int4 Weight Quant

W4A16 quant of mistral-community/pixtral-12b using kylesayrs/gptq-partition branch of LLM Compressor for optimised inference on VLLM.

vision_tower kept at FP16. language_model weights quantized to 4bit.

Calibrated on 512 flickr samples.

Example VLLM usage

vllm serve nintwentydo/pixtral-12b-2409-W4A16-G128 --max-model-len 131072 --limit-mm-per-prompt 'image=4' 

If you want a more advanced/fully featured chat template you can use this jinja template

Downloads last month
99
Safetensors
Model size
3.23B params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support image-text-to-text models for vllm library.

Model tree for nintwentydo/pixtral-12b-2409-W4A16-G128

Quantized
(5)
this model