metadata
license: llama3
pipeline_tag: text-generation
base_model: OwenArli/ArliAI-Llama-3-8B-Instruct-ORPO-v0.1
QuantFactory/ArliAI-Llama-3-8B-Instruct-ORPO-v0.1-GGUF
This is quantized version of OwenArli/ArliAI-Llama-3-8B-Instruct-ORPO-v0.1 created using llama.cpp
Model Description
Based on Meta-Llama-3-8b-Instruct, and is governed by Meta Llama 3 License agreement: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
ORPO fine tuning method using the following datasets:
- https://huggingface.co/datasets/Intel/orca_dpo_pairs
- https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo
- https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2
- https://huggingface.co/datasets/M4-ai/prm_dpo_pairs_cleaned
- https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1
Despite the toxic datasets to reduce refusals, this model is still relatively safe but refuses less than the original Meta model.
As of now ORPO fine tuning seems to improve some metrics while reducing other metrics by a lot:
Instruct format:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>
{{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
{{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>
{{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Quants: