--- base_model: AtlaAI/Selene-1-Mini-Llama-3.1-8B library_name: transformers language: - en - de - fr - it - pt - es pipeline_tag: text-generation tags: - llama - atla - evaluation - llm-as-a-judge - meta - conversational - lm-judge - llama-cpp - gptq license: llama3.1 ---
🛝 Playground | 📄 Technical report | 💻 GitHub | 👀 Sign up for the API
# AtlaAI/Selene-1-Mini-Llama-3.1-8B-GPTQ-W8A8 This model was quantised into an **8-bit** (W8A8) format using GPTQ and SmoothQuant from [`AtlaAI/Selene-1-Mini-Llama-3.1-8B`](https://huggingface.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B). This was done using vLLM's llm-compressor library (https://docs.vllm.ai/en/stable/features/quantization/int8.html) Refer to the [original model card](https://huggingface.co/AtlaAI/Selene-1-Mini-Llama-3.1-8B) for more details on the model. This quantisation was calibrated using a sample of 512 datapoints from the data used to train Selene-1-Mini. As a result, our quantised models show minimal performance degradation, losing <0.5% overall across benchmarks! For reference, a GPTQ quantized 8-bit [Llama-3.1-8B](neuralmagic/Meta-Llama-3.1-8B-quantized.w8a8) shows ~1.5% degradation across benchmarks. 