Multilingual Web datasets

Occiglot
community
AI & ML interests
Open Source Language Models for Europe
Recent Activity
View all activity
Organization Card
Occiglot is an ongoing open research project for multilingual language models.
If you want to train a model for your own language or are working on evaluations, please contact us or join our Discord server. We are actively seeking collaborations!
Collections
3
spaces
1
models
10

occiglot/occiglot-7b-es-en-instruct
Text Generation
•
Updated
•
63
•
2

occiglot/occiglot-7b-eu5
Text Generation
•
Updated
•
202
•
27

occiglot/occiglot-7b-de-en-instruct
Text Generation
•
Updated
•
629
•
24

occiglot/occiglot-7b-eu5-instruct
Text Generation
•
Updated
•
510
•
9

occiglot/occiglot-7b-it-en-instruct
Text Generation
•
Updated
•
7.07k
•
5

occiglot/occiglot-7b-fr-en-instruct
Text Generation
•
Updated
•
39
•
3

occiglot/occiglot-7b-it-en
Text Generation
•
Updated
•
53
•
6

occiglot/occiglot-7b-fr-en
Text Generation
•
Updated
•
996
•
2

occiglot/occiglot-7b-de-en
Text Generation
•
Updated
•
95
•
8

occiglot/occiglot-7b-es-en
Text Generation
•
Updated
•
1.54k
•
4
datasets
6
occiglot/euro-llm-leaderboard-requests
Preview
•
Updated
•
2.14k
•
2
occiglot/arcX
Viewer
•
Updated
•
1.17k
•
67
occiglot/hellaswagX
Viewer
•
Updated
•
9.98k
•
64
occiglot/occiglot-fineweb-v1.0
Updated
•
2.73k
•
3
occiglot/occiglot-fineweb-v0.5
Viewer
•
Updated
•
226M
•
201
•
15
occiglot/tokenizer-wiki-bench
Viewer
•
Updated
•
84.4M
•
7.43k
•
5