---
language:
- en
pipeline_tag: text-generation
base_model:
- google/gemma-2-2b
license: gemma
---

# gemma-2-2b-awq-uint4-asym-g128-lmhead-g32-fp16-onnx
- ## Introduction
  This model was created by applying [Quark](https://quark.docs.amd.com/latest/index.html) with calibration samples from Pile dataset.
- ## Quantization Strategy
  - ***Quantized Layers***: All linear layers
  - ***Weight***: uint4 asymmetric per-group. group_size=32 for lm_head, and group_size=128 for the rest.
- ## Quick Start
1. [Download and install Quark](https://quark.docs.amd.com/latest/install.html)
2. Run the quantization script in the example folder using the following command line:
    ```sh
    export MODEL_DIR = [local model checkpoint folder] or google/gemma-2-2b
    # single GPU
    python quantize_quark.py --model_dir $MODEL_DIR \
                            --output_dir output_dir $MODEL_NAME-awq-uint4-asym-g128-lmhead-g32-fp16 \
                            --quant_scheme w_uint4_per_group_asym \
                            --num_calib_data 128 \
                            --quant_algo awq \
                            --dataset pileval_for_awq_benchmark \
                            --model_export hf_format \
                            --group_size 128 \
                            --group_size_per_layer lm_head 32 \
                            --data_type float32 \
                            --exclude_layers
    # cpu
    python quantize_quark.py --model_dir $MODEL_DIR \
                            --output_dir output_dir $MODEL_NAME-awq-uint4-asym-g128-lmhead-g32-fp16 \
                            --quant_scheme w_uint4_per_group_asym \
                            --num_calib_data 128 \
                            --quant_algo awq \
                            --dataset pileval_for_awq_benchmark \
                            --model_export hf_format \
                            --group_size 128 \
                            --group_size_per_layer lm_head 32 \
                            --data_type float32 \
                            --exclude_layers \
                            --device cpu
    ```
## Deployment
Quark has its own export format, quark_safetensors, which is compatible with autoAWQ exports.

#### License
Modifications copyright(c) 2025 Advanced Micro Devices,Inc. All rights reserved.