Midnight Rose 70B v2.0.3 - GPTQ (naive)
a (naive) first attempt at quantizing a pre-trained model with AutoGPTQ, targetting sophosympatheia/Midnight-Rose-70B-v2.0.3
the base is a popular opensource model that scored highly on EQBench and was specifically designed for storytelling and roleplaying
dropped the model size down from ~140->35GB
Notes
to start, i went through the first half of this gentle introduction to GPTQ on the huggingface blog.
then, followed some introductory examples, and got a sense of the parameters to pick from TheBloke's READMEs. ultimately coming up with this code to perform the operation & serve the model
running on an H100 on modal.com took ~1.5hrs for roughly $10
one thing i skipped (hence the "naive" part) is a quality calibration dataset to quantize with, instead opting to first try with the one sentence from the introductory example link above
GPTQ dataset: The calibration dataset used during quantisation. Using a dataset more appropriate to the model's training can improve quantisation accuracy.
Note that the GPTQ calibration dataset is not the same as the dataset used to train the model - please refer to the original model repo for details of the training dataset(s).
i see TheBloke uses VMWare/open-instruct, so i'll probably try to re-do it properly in order to compare results
Hurdles
when trying to quantize on a cheaper A100 40GB, the run was slower to progress, but also would fail consistenty at layer 9/80 with this error:
torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite
(the leading minor of order 28622 is not positive-definite).
no idea what that^ means, so i just stuck with an H100 that didnt have the problem
also, first time running inference with it on vLLM, i got this error:
OSError: Can't load tokenizer for '/models/sophosympatheia/midnight-rose-70b-v2.0.3-gptq'.
If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name.
Otherwise, make sure '/models/sophosympatheia/midnight-rose-70b-v2.0.3-gptq' is the correct path to a directory containing all
relevant files for a LlamaTokenizerFast tokenizer.
but after copying these files over from the original model, it worked: tokenizer.json
, tokenizer.model
, tokenizer_config.json
- Downloads last month
- 7
Model tree for sambarnes/Midnight-Rose-70B-v2.0.3-GPTQ-naive
Base model
sophosympatheia/Midnight-Rose-70B-v2.0.3