Llama 3.1 8B Experimental 1206

Overall Strengths

  1. Logical and Boolean Reasoning – Excels in tasks requiring clear, rule-based logic and manipulation of true/false statements.
  2. Focused Domain Knowledge – Strong at certain specialized tasks (sports rules, ruin names, hyperbaton) that blend world knowledge with language comprehension.
  3. Good Instruction Compliance – High prompt-level and instance-level accuracy (both strict and loose) indicate that it follows user instructions effectively, even in more complex or nuanced prompts.
  4. Reasonable Multi-step Reasoning – While not the best in every logic category, it still shows solid performance (60%+) on tasks like disambiguation and causal reasoning.
  5. Extended Context Window (138k) – The large 138k token context allows the model to handle lengthy inputs and maintain coherence across extensive passages or multi-turn conversations. This is especially valuable for tasks like long-document question answering, summarization, or complex scenario analysis where context retention is crucial.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 25.67
IFEval (0-Shot) 69.67
BBH (3-Shot) 30.06
MATH Lvl 5 (4-Shot) 11.10
GPQA (0-shot) 6.60
MuSR (0-shot) 8.50
MMLU-PRO (5-shot) 28.10
Downloads last month
31
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for sethuiyer/Llama-3.1-8B-Experimental-1206-Instruct

Finetuned
(145)
this model
Merges
1 model
Quantizations
1 model

Evaluation results