Pixtral-12B-2409: int4 Weight Quant

vision_tower kept at FP16. language_model weights quantized to 4bit.

Calibrated on 512 flickr samples.

Example VLLM usage

vllm serve nintwentydo/pixtral-12b-2409-W4A16-G128 --max-model-len 131072 --limit-mm-per-prompt 'image=4'

If you want a more advanced/fully featured chat template you can use this jinja template

Safetensors

Model size

3.23B params

Tensor type

I64

I32

BF16

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

The model cannot be deployed to the HF Inference API: The HF Inference API does not support image-text-to-text models for vllm library.

Model tree for nintwentydo/pixtral-12b-2409-W4A16-G128

Base model

Quantized

(5)

this model