Post
3714
🆕 Releasing a new series of 8 zeroshot classifiers: better performance, fully commercially useable thanks to synthetic data, up to 8192 tokens, run on any hardware.
Summary:
🤖 The zeroshot-v2.0-c series replaces commercially restrictive training data with synthetic data generated with mistralai/Mixtral-8x7B-Instruct-v0.1 (Apache 2.0). All models are released under the MIT license.
🦾 The best model performs 17%-points better across 28 tasks vs. facebook/bart-large-mnli (the most downloaded commercially-friendly baseline).
🌏 The series includes a multilingual variant fine-tuned from BAAI/bge-m3 for zeroshot classification in 100+ languages and with a context window of 8192 tokens
🪶 The models are 0.2 - 0.6 B parameters small, so they run on any hardware. The base-size models are +2x faster than bart-large-mnli while performing significantly better.
🤏 The models are not generative LLMs, they are efficient encoder-only models specialized in zeroshot classification through the universal NLI task.
🤑 For users where commercially restrictive training data is not an issue, I've also trained variants with even more human data for improved performance.
Next steps:
✍️ I'll publish a blog post with more details soon
🔮 There are several improvements I'm planning for v2.1. Especially the multilingual model has room for improvement.
All models are available for download in this HF中国镜像站 collection: MoritzLaurer/zeroshot-classifiers-6548b4ff407bb19ff5c3ad6f
These models are an extension of the approach explained in this paper, but with additional synthetic data: https://arxiv.org/abs/2312.17543
Summary:
🤖 The zeroshot-v2.0-c series replaces commercially restrictive training data with synthetic data generated with mistralai/Mixtral-8x7B-Instruct-v0.1 (Apache 2.0). All models are released under the MIT license.
🦾 The best model performs 17%-points better across 28 tasks vs. facebook/bart-large-mnli (the most downloaded commercially-friendly baseline).
🌏 The series includes a multilingual variant fine-tuned from BAAI/bge-m3 for zeroshot classification in 100+ languages and with a context window of 8192 tokens
🪶 The models are 0.2 - 0.6 B parameters small, so they run on any hardware. The base-size models are +2x faster than bart-large-mnli while performing significantly better.
🤏 The models are not generative LLMs, they are efficient encoder-only models specialized in zeroshot classification through the universal NLI task.
🤑 For users where commercially restrictive training data is not an issue, I've also trained variants with even more human data for improved performance.
Next steps:
✍️ I'll publish a blog post with more details soon
🔮 There are several improvements I'm planning for v2.1. Especially the multilingual model has room for improvement.
All models are available for download in this HF中国镜像站 collection: MoritzLaurer/zeroshot-classifiers-6548b4ff407bb19ff5c3ad6f
These models are an extension of the approach explained in this paper, but with additional synthetic data: https://arxiv.org/abs/2312.17543