YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Quantization made by Richard Erkhov.

Github

Discord

Request more models

VanillaKD-Pretrain-Qwen-500M - EXL2

Available sizes

Branch Bits Description
8_0 8.0 Maximum quality that ExLlamaV2 can produce, near unquantized performance.
6_5 6.5 Very similar to 8.0, good tradeoff of size vs performance, recommended.
5_0 5.0 Slightly lower quality vs 6.5, but usable
4_25 4.25 GPTQ equivalent bits per weight, slightly higher quality.
3_5 3.5 Lower quality, only use if you have to.

Download instructions

With git:

git clone --single-branch --branch 6_5 https://huggingface.co/MiniLLM_-_VanillaKD-Pretrain-Qwen-500M-exl2 VanillaKD-Pretrain-Qwen-500M-6_5

With huggingface hub:

pip3 install huggingface-hub

To download a specific branch, use the --revision parameter. For example, to download the 6.5 bpw branch: Linux:

huggingface-cli download MiniLLM_-_VanillaKD-Pretrain-Qwen-500M-exl2 --revision 6_5 --local-dir VanillaKD-Pretrain-Qwen-500M-6_5 --local-dir-use-symlinks False

Windows (which apparently doesn't like _ in folders sometimes?):

huggingface-cli download MiniLLM_-_VanillaKD-Pretrain-Qwen-500M-exl2 --revision 6_5 --local-dir VanillaKD-Pretrain-Qwen-500M-6.5 --local-dir-use-symlinks False

Original model description:

library_name: transformers license: apache-2.0 datasets: - monology/pile-uncopyrighted - MiniLLM/pile-tokenized language: - en metrics: - accuracy pipeline_tag: text-generation

VanillaKD-Pretrain-Qwen-500M

paper | code

VanillaKD-Pretrain-Qwen-500M is a 500M model with Qwen achitecture pre-trained with vanilla token-level knowledge distillation on the Pile for 50B tokens. The teacher model is Qwen1.5-1.8B.

We also open-source the tokenized pre-training corpus for reproducibility.

It is used as the baseline for MiniLLM-Qwen-500M

Evaluation

MiniPLM models achieves better performance given the same computation and scales well across model sizes:

Other Baselines

Citation

@article{miniplm,
    title={MiniPLM: Knowledge Distillation for Pre-Training Language Models}, 
    author={Yuxian Gu and Hao Zhou and Fandong Meng and Jie Zhou and Minlie Huang},
    journal={arXiv preprint arXiv:2410.17215},
    year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.