Russian Jokes Generator

This repository contains three versions of a Transformer-based language model fine-tuned on a dataset of Russian jokes (Anecdotes). The models are designed to generate humorous and coherent Russian text. There are three branch available: "main" (with nano model), "mini", "small". Also in this repository remain the pretrained on the dataset Byte-level BPE Tokenizer. The most coherent and powerful model is small.

Model Details

Model Architecture

These models is a Transformer with ALiBi positional embeddings (or RoPE, Rotary Positional Embedding), Grouped-Query Attention (GQA), and SwiGLU activation. Two of three models were trained with Multi-Head Latent Attention.

There are three versions:

  • Nano: 3 layers, 4 heads, 96 hidden dimensions.
  • Mini: 6 layers, 6 heads, 384 hidden dimensions. Trained with RoPE and MHLA.
  • Small: 12 layers, 12 heads, 768 hidden dimensions.Trained with RoPE and MHLA.
  • Tokenizer: Byte-level BPE tokenizer trained on the Russian jokes dataset.

Training Details

  1. Training Epochs are calculated from the number of full iterations of all dataset and were set from the n_step parameter in the initialization of Trainer. Finally, there are 1 for nano model, 1 for mini model, 6 for small model.

  2. Batch Size: 32 - for nano and mini. 64 - for small.

  3. Learning Rate: 5e-4 with cosine decay for small, 3e-4 for nano and small.

  4. As a Loss were used Cross-entropy loss.

  5. For Hardware were used NVIDIA A100 GPU in the Google Colab.

Performance

Model Training Loss (min) Validation Loss (min)
Nano 3.784 3.932
Mini 3.127 3.144
Small 2.933 3.025

Nano Plots

image/png

Epoch:

Parameter Min Max Cur
epoch 0.000 1.000 1.000

Loss:

Parameter Min Max Cur
training 3.784 6.952 3.900
validation 3.932 4.902 3.932

Mini Plots

image/png

Epoch:

Parameter Min Max Cur
epoch 0.000 1.000 1.000

Loss:

Parameter Min Max Cur
training 3.127 7.000 3.278
validation 3.144 4.500 3.144

Small Plots

image/png

Epoch:

Parameter Min Max Cur
epoch 0.000 6.000 6.000

Loss:

Parameter Min Max Cur
training 2.236 7.078 2.328
validation 2.568 4.657 2.568

Usage

You can load the models and tokenizer:

In the main branch - nano, in the mini - mini, in the small - small:

# Small model
model_small = TransformerForCausalLM.from_pretrained("estnafinema0/russian-jokes-generator ", revision="small")
tokenizer = ByteLevelBPETokenizer.from_pretrained("estnafinema0/russian-jokes-generator ")

To generate the examples with the initial prompt:

text = "Штирлиц пришел домой"
input_ids = torch.tensor(tokenizer.encode(text), device=device)
model_output = check_model.generate(
    input_ids[None, :], max_new_tokens=200, eos_token_id=tokenizer.eos_token_id, do_sample=True, top_k=10
)
tokenizer.decode(model_output[0].tolist())

Example Generations

Here are some examples of generated jokes by the small model:

  1. Input: "Пришел Петя в баню и говорит"

    Output: "Пришел Петя в баню и говорит - Василий Иванович, вы знаете, кто я - Петя, или Петя? - Ахааха, и я - Ахаилая, я - Ахаил! - А какая Петя? - Я - Ахаилая! - Ну и я, когда я банкрот, банкротство, конечно..."

  2. Input: "Вышел как-то на крыльцо"

    Output: "Вышел как-то на крыльцо, а там плачет. Стукнулся: упал, выпал. Плачет – упал."

  3. Input: "Священник задает ребёнку вопрос" Output: "Священник задает ребёнку вопрос ему на ухо:- Что, братан, опять несёл?- Братан, ты что, братан, охуел?"

License

This model is licensed under the Apache 2.0 License.

This model has been pushed to the Hub using the PytorchModelHubMixin integration:

  • Library: [More Information Needed]
Downloads last month
21
Safetensors
Model size
554k params
Tensor type
F32
·
BOOL
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train estnafinema0/russian-jokes-generator