Midnight Rose 70B v2.0.3 - GPTQ (naive)

a (naive) first attempt at quantizing a pre-trained model with AutoGPTQ, targetting sophosympatheia/Midnight-Rose-70B-v2.0.3

the base is a popular opensource model that scored highly on EQBench and was specifically designed for storytelling and roleplaying

dropped the model size down from ~140->35GB

Notes

to start, i went through the first half of this gentle introduction to GPTQ on the huggingface blog.

then, followed some introductory examples, and got a sense of the parameters to pick from TheBloke's READMEs. ultimately coming up with this code to perform the operation & serve the model

running on an H100 on modal.com took ~1.5hrs for roughly $10

one thing i skipped (hence the "naive" part) is a quality calibration dataset to quantize with, instead opting to first try with the one sentence from the introductory example link above

GPTQ dataset: The calibration dataset used during quantisation. Using a dataset more appropriate to the model's training can improve quantisation accuracy.

Note that the GPTQ calibration dataset is not the same as the dataset used to train the model - please refer to the original model repo for details of the training dataset(s).

i see TheBloke uses VMWare/open-instruct, so i'll probably try to re-do it properly in order to compare results

Hurdles

when trying to quantize on a cheaper A100 40GB, the run was slower to progress, but also would fail consistenty at layer 9/80 with this error:

torch._C._LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite
(the leading minor of order 28622 is not positive-definite).

no idea what that^ means, so i just stuck with an H100 that didnt have the problem

also, first time running inference with it on vLLM, i got this error:

OSError: Can't load tokenizer for '/models/sophosympatheia/midnight-rose-70b-v2.0.3-gptq'.
If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name.
Otherwise, make sure '/models/sophosympatheia/midnight-rose-70b-v2.0.3-gptq' is the correct path to a directory containing all
relevant files for a LlamaTokenizerFast tokenizer.

but after copying these files over from the original model, it worked: tokenizer.json , tokenizer.model , tokenizer_config.json

sambarnes
/

Midnight-Rose-70B-v2.0.3-GPTQ-naive

Midnight Rose 70B v2.0.3 - GPTQ (naive)

Notes

Hurdles

Model tree for sambarnes/Midnight-Rose-70B-v2.0.3-GPTQ-naive