See axolotl config

axolotl version: 0.8.0.dev0

base_model: mistralai/Mistral-7B-Instruct-v0.3
# optionally might have model_type or tokenizer_type
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
# Automatically upload checkpoint and final model to HF
hub_model_id: AiAF/Pretraining-SCPWiki-032025-7B-Instruct

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: AiAF/Pretraining-SCPWiki-032025-7B-Instruct-pretraining.jsonl
   # type: completion
   # text_column: text # column in dataset with the data, usually `text`
    type: completion
dataset_prepared_path: last_run_prepared
val_set_size: 0.1
output_dir: ./outputs/qlora-out/Pretraining-SCPWiki-032025-7B-Instruct-V1

adapter: qlora
lora_model_dir:

sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

wandb_project: "LLM-Pretraining"
wandb_entity:
wandb_watch: "all"
wandb_name: "Pretraining-SCPWiki-032025-7B-Instruct-V1"
wandb_log_model: "false"

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

loss_watchdog_threshold: 5.0
loss_watchdog_patience: 3

warmup_steps: 10
evals_per_epoch: 20
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 20
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:

Pretraining-SCPWiki-032025-7B-Instruct

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the AiAF/Pretraining-SCPWiki-032025-7B-Instruct-pretraining.jsonl dataset. It achieves the following results on the evaluation set:

Loss: 1.5048

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss
2.0192	0.0016	1	1.9469
1.3794	0.0509	32	1.5979
1.5383	0.1019	64	1.5626
1.3583	0.1528	96	1.5544
1.3354	0.2037	128	1.5393
1.4771	0.2547	160	1.5319
1.4542	0.3056	192	1.5262
1.2767	0.3565	224	1.5228
1.3347	0.4075	256	1.5202
1.4451	0.4584	288	1.5169
1.1028	0.5094	320	1.5147
1.315	0.5603	352	1.5126
1.3244	0.6112	384	1.5106
1.3915	0.6622	416	1.5089
1.3156	0.7131	448	1.5077
1.2967	0.7640	480	1.5067
1.4046	0.8150	512	1.5056
1.4017	0.8659	544	1.5052
1.2678	0.9168	576	1.5050
1.231	0.9678	608	1.5048

Framework versions

PEFT 0.14.0
Transformers 4.49.0
Pytorch 2.5.1+cu124
Datasets 3.2.0
Tokenizers 0.21.0

AiAF
/

Pretraining-SCPWiki-032025-7B-Instruct

Pretraining-SCPWiki-032025-7B-Instruct

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for AiAF/Pretraining-SCPWiki-032025-7B-Instruct

Dataset used to train AiAF/Pretraining-SCPWiki-032025-7B-Instruct

Evaluation results