Primus - a trend-cybertron Collection

Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training

Paper • 2502.11191 • Published 27 days ago • 4

Note Start by reading the 🚀Primus Paper! To the best of our knowledge, we are the 🏄🏽‍♂️ first to release datasets covering cybersecurity pretraining, IFT, and reasoning distillation. Of course, we are also the first to pretrain an LLM on a large-scale cybersecurity corpus.

trend-cybertron/Llama-Primus-Base

Text Generation • Updated 10 days ago

Note Based on Llama-3.1-8B-Instruct, continually pretrained on 2.77B tokens of cybersecurity text, achieving a 🚀15.88% improvement in the aggregated score across multiple cybersecurity benchmarks.

trend-cybertron/Llama-Primus-Merged

Text Generation • Updated 10 days ago

Note Instruct Model! While maintaining nearly the same instruction-following capability as Llama-3.1-8B-Instruct, achieving a 🚀14.84% improvement across multiple cybersecurity benchmarks.

trend-cybertron/Llama-Primus-Reasoning

Text Generation • Updated 10 days ago

Note Distilled on reasoning and reflection data from o1-preview for cybersecurity tasks, achieving a 🚀10% improvement on CISSP.

trend-cybertron/Primus-Seed

Updated 10 days ago • 72

Note Includes high-quality cybersecurity texts manually collected from reputable sources such as wikipedia, MITRE, cybersecurity company websites, CTI, and more.

trend-cybertron/Primus-FineWeb

Updated 10 days ago • 74 • 1

Note Includes 2.57B tokens of cybersecurity texts filtered from FineWeb.

trend-cybertron/Primus-Instruct

Updated 10 days ago • 67 • 1

Note Includes approximately 1K QA pairs covering common cybersecurity business scenarios.

trend-cybertron/Primus-Reasoning

Updated 10 days ago • 65 • 1

Note Includes reasoning and reflection data generated by o1-preview on cybersecurity tasks for distillation.