HF中国镜像站

QuantFactory
/

ArliAI-Llama-3-8B-Instruct-ORPO-v0.1-GGUF

Text Generation

Model card Files Files and versions Community

ArliAI-Llama-3-8B-Instruct-ORPO-v0.1-GGUF / README.md

munish0838's picture

Create README.md

21d7022 verified 9 months ago

|

history blame contribute delete

1.66 kB

	---
	license: llama3
	pipeline_tag: text-generation
	base_model: OwenArli/ArliAI-Llama-3-8B-Instruct-ORPO-v0.1
	---

	# QuantFactory/ArliAI-Llama-3-8B-Instruct-ORPO-v0.1-GGUF
	This is quantized version of [OwenArli/ArliAI-Llama-3-8B-Instruct-ORPO-v0.1](https://huggingface.co/OwenArli/ArliAI-Llama-3-8B-Instruct-ORPO-v0.1) created using llama.cpp

	# Model Description
	Based on Meta-Llama-3-8b-Instruct, and is governed by Meta Llama 3 License agreement:
	https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct


	ORPO fine tuning method using the following datasets:
	- https://huggingface.co/datasets/Intel/orca_dpo_pairs
	- https://huggingface.co/datasets/argilla/distilabel-math-preference-dpo
	- https://huggingface.co/datasets/unalignment/toxic-dpo-v0.2
	- https://huggingface.co/datasets/M4-ai/prm_dpo_pairs_cleaned
	- https://huggingface.co/datasets/jondurbin/truthy-dpo-v0.1

	Despite the toxic datasets to reduce refusals, this model is still relatively safe but refuses less than the original Meta model.

	As of now ORPO fine tuning seems to improve some metrics while reducing other metrics by a lot:

	![OpenLLM Leaderboard](https://huggingface.co/AwanLLM/Awanllm-Llama-3-8B-Instruct-ORPO-v0.1/blob/main/Screenshot%202024-05-01%20204933.png "OpenLLM Leaderboard")


	Instruct format:
	```
	<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>

	{{ system_prompt }}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	{{ user_message_1 }}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>

	{{ model_answer_1 }}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	{{ user_message_2 }}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>
	```


	Quants: