gemma-2b-orpo-GGUF

This is a GGUF quantized version of the gemma-2b-orpo model: an ORPO fine-tune of google/gemma-2b.

You can find more information, including evaluation and training/usage notebook in the gemma-2b-orpo model card

🎮 Model in action

The model can run with all the libraries that are part of the Llama.cpp ecosystem.

If you need to apply the prompt template manually, take a look at the tokenizer_config.json of the original model.

📱 Run the model on a budget smartphone -> see my recent post

Here a simple example with Llama.cpp python:

! pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="anakin87/gemma-2b-orpo-GGUF",
    filename="gemma-2b-orpo.Q5_K_M.gguf",
    verbose=True # for a known bug, verbose must be True
)

# text generation - prompt template applied manually
llm("<bos><|im_start|> user\nName the planets in the solar system<|im_end|>\n<|im_start|>assistant\n", max_tokens=75)

# chat completion - prompt template automatically applied
llm.create_chat_completion(
      messages = [
          {
              "role": "user",
              "content": "Please list some places to visit in Italy"
          }
      ]
)
Downloads last month
20
GGUF
Model size
2.51B params
Architecture
gemma

5-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for anakin87/gemma-2b-orpo-GGUF

Base model

google/gemma-2b
Quantized
(3)
this model

Dataset used to train anakin87/gemma-2b-orpo-GGUF

Space using anakin87/gemma-2b-orpo-GGUF 1