IIC
/

RigoChat-7b-v2

@@ -24,11 +24,11 @@ Key benefits of this model include:
 - Enhanced safety and reduced hallucinations in RAG systems with Spanish texts.
 - Possibility of using it in different hardware requirements, especially those with reduced computational capacity. For more information on how to use RigoChat-7b-v2 on reduced hardware, see [IIC/RigoChat-7b-v2-GGUF](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF).
-Remarkably, this model was trained on a single A100 GPU with limited computational resources, yet achieved its current state in a relatively short time (less than 12 hours). This feat was made possible by leveraging a high-quality dataset and employing advanced techniques such as [LoRA](https://arxiv.org/pdf/2106.09685) to optimize memory usage. Further details on the training process can be found below.
 - **Developed by:** Instituto de Ingeniería del Conocimiento (IIC).
 - **Model type:** Generative Fine-tuned Transformer.
-- **Language(s) (NLP):** Spanish.
 - **License:** CC BY NC 4.0.
 - **Finetuned from model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
@@ -126,19 +126,18 @@ generated_ids = [
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
-For a better experience, we recommend using [the following default generation parameters](https://huggingface.co/IIC/RigoChat-7b-v2/blob/main/generation_config.json).
 ## Training Details
 ### Training Data
-A combination of both public and private datasets designed in the IIC. The dataset consists of 21975 conversations in Spanish, with the format `chatml` and has the same structure as the [Anthropic/hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf). Each conversation has two variants: `chosen` and `rejected`, where the only thing that changes is the last answer of the assistant. The last answer in the `chosen` variant is considered a better answer than the one in the `rejected` variant. Different techniques have been used to generate the dataset, which we explain in depth in the paper (**coming soon**).
 ### Training Procedure
 We use the [Transformer Reinforcement Learning](https://huggingface.co/docs/trl/index) (TRL) library. Specifically, we have applied [the script they have published](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py) as an example for using DPO to the dataset we have generated.
 #### Training Hyperparameters
 ```shell
@@ -170,8 +169,8 @@ DPO_CONFIG = {
     "per_device_eval_batch_size": 1,
     "gradient_accumulation_steps": 16,
     "learning_rate": 5e-6,
-    "max_length": 8192,
-    "max_prompt_length": 6656,
     "gradient_checkpointing": True,
     "weight_decay": 0.001,
     "optim": "rmsprop",
@@ -181,11 +180,17 @@ DPO_CONFIG = {
 }
 ```
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
@@ -219,12 +224,6 @@ DPO_CONFIG = {
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
 ## Environmental Impact
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
@@ -237,8 +236,6 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 - **Compute Region:** [More Information Needed]
 - **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
 ### Model Architecture and Objective
 [More Information Needed]
@@ -255,31 +252,18 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 [More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
 ## Model Card Contact

 - Enhanced safety and reduced hallucinations in RAG systems with Spanish texts.
 - Possibility of using it in different hardware requirements, especially those with reduced computational capacity. For more information on how to use RigoChat-7b-v2 on reduced hardware, see [IIC/RigoChat-7b-v2-GGUF](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF).
+Remarkably, this model was trained on a single A100 GPU with limited computational resources, yet achieved its current state in a relatively short time (8.5 hours). This feat was made possible by leveraging a high-quality dataset and employing advanced techniques such as [LoRA](https://arxiv.org/pdf/2106.09685) to optimize memory usage. Further details on the training process can be found below.
 - **Developed by:** Instituto de Ingeniería del Conocimiento (IIC).
 - **Model type:** Generative Fine-tuned Transformer.
+- **Language(s) (NLP):** Spanish (BCP-47 es).
 - **License:** CC BY NC 4.0.
 - **Finetuned from model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
+For a better experience, we recommend using [the following generation parameters](https://huggingface.co/IIC/RigoChat-7b-v2/blob/main/generation_config.json).
 ## Training Details
 ### Training Data
+A combination of both public and private datasets designed in the IIC. The dataset consists of 21975 conversations in Spanish, with the format `chatml` and has the same structure as the [Anthropic/hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf). Each conversation has two variants: `chosen` and `rejected`, and only differs the last answer of the assistant. The last answer in the `chosen` variant is considered a better answer than the one in the `rejected` variant. Different techniques have been used to generate the dataset, which we explain in depth in the paper (**coming soon**).
 ### Training Procedure
 We use the [Transformer Reinforcement Learning](https://huggingface.co/docs/trl/index) (TRL) library. Specifically, we have applied [the script they have published](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py) as an example for using DPO to the dataset we have generated.
 #### Training Hyperparameters
 ```shell
     "per_device_eval_batch_size": 1,
     "gradient_accumulation_steps": 16,
     "learning_rate": 5e-6,
+    "max_length": 8192, # max length in the history chat + latest assistant response.
+    "max_prompt_length": 6656, # max length in the history chat: user-assistant-...-assistant-user.
     "gradient_checkpointing": True,
     "weight_decay": 0.001,
     "optim": "rmsprop",
 }
 ```
+#### Speeds, Sizes, Times
+Below are some useful parameters showing the results of the latest training logs.
+```python
+latest_logs = {'loss': 0.3716, 'grad_norm': 4.989994049072266, 'learning_rate': 1.0380020311950844e-10, 'rewards/chosen': 0.534086287021637, 'rewards/rejected': -0.6236276030540466, 'rewards/accuracies': 0.8899999856948853, 'rewards/margins': 1.1577140092849731, 'logps/rejected': -218.88198852539062, 'logps/chosen': -250.0700225830078, 'logits/rejected': -1.6214849948883057, 'logits/chosen': -1.9585875272750854, 'epoch': 1.99}
+final_training_results = {'train_runtime': 30825.7138, 'train_samples_per_second': 1.432, 'train_steps_per_second': 0.089, 'train_loss': 0.483570138469306, 'epoch': 2.0}
+```
+As can be seen in the time used, in eight and a half hours we have managed to improve a state-of-the-art model, with very little hardware, in tasks adapted to Spanish. This can be seen in more detail in the following sections.
 ## Evaluation
 ## Environmental Impact
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 - **Compute Region:** [More Information Needed]
 - **Carbon Emitted:** [More Information Needed]
 ### Model Architecture and Objective
 [More Information Needed]
 [More Information Needed]
+## Citation
+```
+@misc {Instituto de Ingeniería del Conocimiento (IIC),
+	author       = { {Instituto de Ingeniería del Conocimiento} },
+	title        = { Adapting a language model to Spanish using a dataset and reduced hardware },
+	year         = 2024,
+	url          = { https://huggingface.co/datasets/IIC/RigoChat-7b-v2 },
+	doi          = { 10.57967/hf/2043 },
+	publisher    = { HF中国镜像站 }
+}
+```
 ## Model Card Contact