Russian Jokes Generator
This repository contains three versions of a Transformer-based language model fine-tuned on a dataset of Russian jokes (Anecdotes). The models are designed to generate humorous and coherent Russian text. There are three branch available: "main" (with nano
model), "mini", "small".
Also in this repository remain the pretrained on the dataset Byte-level BPE Tokenizer. The most coherent and powerful model is small
.
Model Details
Model Architecture
These models is a Transformer with ALiBi positional embeddings (or RoPE, Rotary Positional Embedding), Grouped-Query Attention (GQA), and SwiGLU activation. Two of three models were trained with Multi-Head Latent Attention.
There are three versions:
- Nano: 3 layers, 4 heads, 96 hidden dimensions.
- Mini: 6 layers, 6 heads, 384 hidden dimensions. Trained with RoPE and MHLA.
- Small: 12 layers, 12 heads, 768 hidden dimensions.Trained with RoPE and MHLA.
- Tokenizer: Byte-level BPE tokenizer trained on the Russian jokes dataset.
Training Details
Training Epochs are calculated from the number of full iterations of all dataset and were set from the n_step parameter in the initialization of Trainer. Finally, there are 1 for nano model, 1 for mini model, 6 for small model.
Batch Size: 32 - for nano and mini. 64 - for small.
Learning Rate: 5e-4 with cosine decay for small, 3e-4 for nano and small.
As a Loss were used Cross-entropy loss.
For Hardware were used NVIDIA A100 GPU in the Google Colab.
Performance
Model | Training Loss (min) | Validation Loss (min) |
---|---|---|
Nano | 3.784 | 3.932 |
Mini | 3.127 | 3.144 |
Small | 2.933 | 3.025 |
Nano Plots
Epoch:
Parameter | Min | Max | Cur |
---|---|---|---|
epoch | 0.000 | 1.000 | 1.000 |
Loss:
Parameter | Min | Max | Cur |
---|---|---|---|
training | 3.784 | 6.952 | 3.900 |
validation | 3.932 | 4.902 | 3.932 |
Mini Plots
Epoch:
Parameter | Min | Max | Cur |
---|---|---|---|
epoch | 0.000 | 1.000 | 1.000 |
Loss:
Parameter | Min | Max | Cur |
---|---|---|---|
training | 3.127 | 7.000 | 3.278 |
validation | 3.144 | 4.500 | 3.144 |
Small Plots
Epoch:
Parameter | Min | Max | Cur |
---|---|---|---|
epoch | 0.000 | 6.000 | 6.000 |
Loss:
Parameter | Min | Max | Cur |
---|---|---|---|
training | 2.236 | 7.078 | 2.328 |
validation | 2.568 | 4.657 | 2.568 |
Usage
You can load the models and tokenizer:
In the main
branch - nano
,
in the mini
- mini
,
in the small
- small
:
# Small model
model_small = TransformerForCausalLM.from_pretrained("estnafinema0/russian-jokes-generator ", revision="small")
tokenizer = ByteLevelBPETokenizer.from_pretrained("estnafinema0/russian-jokes-generator ")
To generate the examples with the initial prompt:
text = "Штирлиц пришел домой"
input_ids = torch.tensor(tokenizer.encode(text), device=device)
model_output = check_model.generate(
input_ids[None, :], max_new_tokens=200, eos_token_id=tokenizer.eos_token_id, do_sample=True, top_k=10
)
tokenizer.decode(model_output[0].tolist())
Example Generations
Here are some examples of generated jokes by the small
model:
Input: "Пришел Петя в баню и говорит"
Output: "Пришел Петя в баню и говорит - Василий Иванович, вы знаете, кто я - Петя, или Петя? - Ахааха, и я - Ахаилая, я - Ахаил! - А какая Петя? - Я - Ахаилая! - Ну и я, когда я банкрот, банкротство, конечно..."
Input: "Вышел как-то на крыльцо"
Output: "Вышел как-то на крыльцо, а там плачет. Стукнулся: упал, выпал. Плачет – упал."
Input: "Священник задает ребёнку вопрос" Output: "Священник задает ребёнку вопрос ему на ухо:- Что, братан, опять несёл?- Братан, ты что, братан, охуел?"
License
This model is licensed under the Apache 2.0 License.
This model has been pushed to the Hub using the PytorchModelHubMixin integration:
- Library: [More Information Needed]
- Downloads last month
- 21