leaderboard-pr-bot's picture
Adding Evaluation Results
d48f361 verified
|
raw
history blame
5.3 kB
metadata
language:
  - multilingual
license: gemma
library_name: transformers
tags:
  - nlp
  - code
base_model: google/gemma-2-2b-jpn-it
license_link: https://ai.google.dev/gemma/terms
pipeline_tag: text-generation
quantized_by: ymcki
widget:
  - messages:
      - role: user
        content: Can you provide ways to eat combinations of bananas and dragonfruits?
model-index:
  - name: gemma-2-2b-jpn-it-abliterated-18
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 0
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ymcki/gemma-2-2b-jpn-it-abliterated-18
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 2.48
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ymcki/gemma-2-2b-jpn-it-abliterated-18
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 0
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ymcki/gemma-2-2b-jpn-it-abliterated-18
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 1.23
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ymcki/gemma-2-2b-jpn-it-abliterated-18
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 2.08
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ymcki/gemma-2-2b-jpn-it-abliterated-18
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 1.86
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ymcki/gemma-2-2b-jpn-it-abliterated-18
          name: Open LLM Leaderboard

Original model: https://huggingface.co/google/gemma-2-2b-jpn-it

Prompt format

<start_of_turn>user
{prompt}<end_of_turn>
<start_of_turn>model
<end_of_turn>
<start_of_turn>model

Note that this model does not support a System prompt.

This is abliterated model of [`google/gemma-2-2b-jpn-it](https://huggingface.co/google/gemma-2-2b-jpn-it) using the method described by mlabonne.

Layer 18 of the original model was chosen for abliteration. I also created another layer 17 abliterated model for comparison.

It is uploaded here to be evaluated by the LLM Leaderboard to see how brain damaged it is compared to the original model.

ORPO fine tuning is currently underway to see if it can regain its sanity. You can play with this model first or wait until I am done with the fine tuning.

How to run this model

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model_id = "gemma-2-2b-jpn-it-abliterated-18"
dtype = torch.bfloat16

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="cuda",
    torch_dtype=dtype,)

chat = [
    { "role": "user", "content": "Write a hello world program" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

Downloading using huggingface-cli

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Then, you can target the specific file you want:

huggingface-cli download ymcki/gemma-2-2b-jpn-it-abliterated-18 --include "*" --local-dir ./

Credits

Thank you mlabonne for describing his abliteration method.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 1.28
IFEval (0-Shot) 0.00
BBH (3-Shot) 2.48
MATH Lvl 5 (4-Shot) 0.00
GPQA (0-shot) 1.23
MuSR (0-shot) 2.08
MMLU-PRO (5-shot) 1.86