IIC
/

gonzalo-santamaria-iic commited on
Commit
85fbd11
·
verified ·
1 Parent(s): b052992

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -42
README.md CHANGED
@@ -24,11 +24,11 @@ Key benefits of this model include:
24
  - Enhanced safety and reduced hallucinations in RAG systems with Spanish texts.
25
  - Possibility of using it in different hardware requirements, especially those with reduced computational capacity. For more information on how to use RigoChat-7b-v2 on reduced hardware, see [IIC/RigoChat-7b-v2-GGUF](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF).
26
 
27
- Remarkably, this model was trained on a single A100 GPU with limited computational resources, yet achieved its current state in a relatively short time (less than 12 hours). This feat was made possible by leveraging a high-quality dataset and employing advanced techniques such as [LoRA](https://arxiv.org/pdf/2106.09685) to optimize memory usage. Further details on the training process can be found below.
28
 
29
  - **Developed by:** Instituto de Ingeniería del Conocimiento (IIC).
30
  - **Model type:** Generative Fine-tuned Transformer.
31
- - **Language(s) (NLP):** Spanish.
32
  - **License:** CC BY NC 4.0.
33
  - **Finetuned from model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
34
 
@@ -126,19 +126,18 @@ generated_ids = [
126
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
127
  ```
128
 
129
- For a better experience, we recommend using [the following default generation parameters](https://huggingface.co/IIC/RigoChat-7b-v2/blob/main/generation_config.json).
130
 
131
  ## Training Details
132
 
133
  ### Training Data
134
 
135
- A combination of both public and private datasets designed in the IIC. The dataset consists of 21975 conversations in Spanish, with the format `chatml` and has the same structure as the [Anthropic/hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf). Each conversation has two variants: `chosen` and `rejected`, where the only thing that changes is the last answer of the assistant. The last answer in the `chosen` variant is considered a better answer than the one in the `rejected` variant. Different techniques have been used to generate the dataset, which we explain in depth in the paper (**coming soon**).
136
 
137
  ### Training Procedure
138
 
139
  We use the [Transformer Reinforcement Learning](https://huggingface.co/docs/trl/index) (TRL) library. Specifically, we have applied [the script they have published](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py) as an example for using DPO to the dataset we have generated.
140
 
141
-
142
  #### Training Hyperparameters
143
 
144
  ```shell
@@ -170,8 +169,8 @@ DPO_CONFIG = {
170
  "per_device_eval_batch_size": 1,
171
  "gradient_accumulation_steps": 16,
172
  "learning_rate": 5e-6,
173
- "max_length": 8192,
174
- "max_prompt_length": 6656,
175
  "gradient_checkpointing": True,
176
  "weight_decay": 0.001,
177
  "optim": "rmsprop",
@@ -181,11 +180,17 @@ DPO_CONFIG = {
181
  }
182
  ```
183
 
184
- #### Speeds, Sizes, Times [optional]
185
 
186
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
187
 
188
- [More Information Needed]
 
 
 
 
 
 
189
 
190
  ## Evaluation
191
 
@@ -219,12 +224,6 @@ DPO_CONFIG = {
219
 
220
 
221
 
222
- ## Model Examination [optional]
223
-
224
- <!-- Relevant interpretability work for the model goes here -->
225
-
226
- [More Information Needed]
227
-
228
  ## Environmental Impact
229
 
230
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
@@ -237,8 +236,6 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
237
  - **Compute Region:** [More Information Needed]
238
  - **Carbon Emitted:** [More Information Needed]
239
 
240
- ## Technical Specifications [optional]
241
-
242
  ### Model Architecture and Objective
243
 
244
  [More Information Needed]
@@ -255,31 +252,18 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
255
 
256
  [More Information Needed]
257
 
258
- ## Citation [optional]
259
-
260
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
261
-
262
- **BibTeX:**
263
-
264
- [More Information Needed]
265
-
266
- **APA:**
267
-
268
- [More Information Needed]
269
-
270
- ## Glossary [optional]
271
-
272
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
273
-
274
- [More Information Needed]
275
-
276
- ## More Information [optional]
277
 
278
- [More Information Needed]
279
-
280
- ## Model Card Authors [optional]
281
-
282
- [More Information Needed]
 
 
 
 
 
283
 
284
  ## Model Card Contact
285
 
 
24
  - Enhanced safety and reduced hallucinations in RAG systems with Spanish texts.
25
  - Possibility of using it in different hardware requirements, especially those with reduced computational capacity. For more information on how to use RigoChat-7b-v2 on reduced hardware, see [IIC/RigoChat-7b-v2-GGUF](https://huggingface.co/IIC/RigoChat-7b-v2-GGUF).
26
 
27
+ Remarkably, this model was trained on a single A100 GPU with limited computational resources, yet achieved its current state in a relatively short time (8.5 hours). This feat was made possible by leveraging a high-quality dataset and employing advanced techniques such as [LoRA](https://arxiv.org/pdf/2106.09685) to optimize memory usage. Further details on the training process can be found below.
28
 
29
  - **Developed by:** Instituto de Ingeniería del Conocimiento (IIC).
30
  - **Model type:** Generative Fine-tuned Transformer.
31
+ - **Language(s) (NLP):** Spanish (BCP-47 es).
32
  - **License:** CC BY NC 4.0.
33
  - **Finetuned from model:** [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
34
 
 
126
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
127
  ```
128
 
129
+ For a better experience, we recommend using [the following generation parameters](https://huggingface.co/IIC/RigoChat-7b-v2/blob/main/generation_config.json).
130
 
131
  ## Training Details
132
 
133
  ### Training Data
134
 
135
+ A combination of both public and private datasets designed in the IIC. The dataset consists of 21975 conversations in Spanish, with the format `chatml` and has the same structure as the [Anthropic/hh-rlhf dataset](https://huggingface.co/datasets/Anthropic/hh-rlhf). Each conversation has two variants: `chosen` and `rejected`, and only differs the last answer of the assistant. The last answer in the `chosen` variant is considered a better answer than the one in the `rejected` variant. Different techniques have been used to generate the dataset, which we explain in depth in the paper (**coming soon**).
136
 
137
  ### Training Procedure
138
 
139
  We use the [Transformer Reinforcement Learning](https://huggingface.co/docs/trl/index) (TRL) library. Specifically, we have applied [the script they have published](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py) as an example for using DPO to the dataset we have generated.
140
 
 
141
  #### Training Hyperparameters
142
 
143
  ```shell
 
169
  "per_device_eval_batch_size": 1,
170
  "gradient_accumulation_steps": 16,
171
  "learning_rate": 5e-6,
172
+ "max_length": 8192, # max length in the history chat + latest assistant response.
173
+ "max_prompt_length": 6656, # max length in the history chat: user-assistant-...-assistant-user.
174
  "gradient_checkpointing": True,
175
  "weight_decay": 0.001,
176
  "optim": "rmsprop",
 
180
  }
181
  ```
182
 
183
+ #### Speeds, Sizes, Times
184
 
185
+ Below are some useful parameters showing the results of the latest training logs.
186
 
187
+ ```python
188
+ latest_logs = {'loss': 0.3716, 'grad_norm': 4.989994049072266, 'learning_rate': 1.0380020311950844e-10, 'rewards/chosen': 0.534086287021637, 'rewards/rejected': -0.6236276030540466, 'rewards/accuracies': 0.8899999856948853, 'rewards/margins': 1.1577140092849731, 'logps/rejected': -218.88198852539062, 'logps/chosen': -250.0700225830078, 'logits/rejected': -1.6214849948883057, 'logits/chosen': -1.9585875272750854, 'epoch': 1.99}
189
+
190
+ final_training_results = {'train_runtime': 30825.7138, 'train_samples_per_second': 1.432, 'train_steps_per_second': 0.089, 'train_loss': 0.483570138469306, 'epoch': 2.0}
191
+ ```
192
+
193
+ As can be seen in the time used, in eight and a half hours we have managed to improve a state-of-the-art model, with very little hardware, in tasks adapted to Spanish. This can be seen in more detail in the following sections.
194
 
195
  ## Evaluation
196
 
 
224
 
225
 
226
 
 
 
 
 
 
 
227
  ## Environmental Impact
228
 
229
  <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 
236
  - **Compute Region:** [More Information Needed]
237
  - **Carbon Emitted:** [More Information Needed]
238
 
 
 
239
  ### Model Architecture and Objective
240
 
241
  [More Information Needed]
 
252
 
253
  [More Information Needed]
254
 
255
+ ## Citation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
256
 
257
+ ```
258
+ @misc {Instituto de Ingeniería del Conocimiento (IIC),
259
+ author = { {Instituto de Ingeniería del Conocimiento} },
260
+ title = { Adapting a language model to Spanish using a dataset and reduced hardware },
261
+ year = 2024,
262
+ url = { https://huggingface.co/datasets/IIC/RigoChat-7b-v2 },
263
+ doi = { 10.57967/hf/2043 },
264
+ publisher = { HF中国镜像站 }
265
+ }
266
+ ```
267
 
268
  ## Model Card Contact
269