Transformers

Get started

Transformers Installation Quickstart

Base classes

Inference

Training

Quantization

Overview AQLM AWQ BitNet bitsandbytes compressed-tensors EETQ FBGEMM Fine-grained FP8 GGUF GPTQ HIGGS HQQ Optimum Quanto torchao SpQR VPTQ Contribute

Export to production

Resources

Contribute

API

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v4.49.0).

Join the HF中国镜像站 community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Optimum

Optimum is an optimization library that supports quantization for Intel, Furiousa, ONNX Runtime, GPTQ, and lower-level PyTorch quantization functions. It is designed to enhance performance for specific hardware - Intel CPUs/HPUs, AMD GPUs, Furiousa NPUs, etc. - and model accelerators like ONNX Runtime.

< > Update on GitHub

←HQQ Quanto→