Add link to paper and project page (#3)
Browse files- Add link to paper and project page (a357c910307598c6ffacd79d1fa2c68594bfa4cb)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,10 +1,10 @@
|
|
1 |
---
|
2 |
-
library_name: transformers
|
3 |
-
license: apache-2.0
|
4 |
-
language:
|
5 |
-
- en
|
6 |
base_model:
|
7 |
- HuggingFaceTB/SmolVLM-256M-Instruct
|
|
|
|
|
|
|
|
|
8 |
pipeline_tag: image-text-to-text
|
9 |
---
|
10 |
|
@@ -16,6 +16,8 @@ pipeline_tag: image-text-to-text
|
|
16 |
</div>
|
17 |
</div>
|
18 |
|
|
|
|
|
19 |
### 🚀 Features:
|
20 |
- 🏷️ **DocTags for Efficient Tokenization** – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with **DoclingDocuments**.
|
21 |
- 🔍 **OCR (Optical Character Recognition)** – Extracts text accurately from images.
|
@@ -39,7 +41,6 @@ pipeline_tag: image-text-to-text
|
|
39 |
- 🧪 **Chemical Recognition**
|
40 |
- 📙 **Datasets**
|
41 |
|
42 |
-
|
43 |
## ⌨️ Get started (code examples)
|
44 |
|
45 |
You can use **transformers** or **vllm** to perform inference, and [Docling](https://github.com/docling-project/docling) to convert results to variety of ourput formats (md, html, etc.):
|
@@ -145,7 +146,8 @@ sampling_params = SamplingParams(
|
|
145 |
temperature=0.0,
|
146 |
max_tokens=8192)
|
147 |
|
148 |
-
chat_template = f"<|im_start|>User:<image>{PROMPT_TEXT}<end_of_utterance
|
|
|
149 |
|
150 |
image_files = sorted([f for f in os.listdir(IMAGE_DIR) if f.lower().endswith((".png", ".jpg", ".jpeg"))])
|
151 |
|
@@ -253,6 +255,8 @@ DocTags are integrated with Docling, which allows export to HTML, Markdown, and
|
|
253 |
|
254 |
**Paper:** [arXiv](https://arxiv.org/abs/2503.11576)
|
255 |
|
|
|
|
|
256 |
**Citation:**
|
257 |
```
|
258 |
@misc{nassar2025smoldoclingultracompactvisionlanguagemodel,
|
@@ -265,4 +269,4 @@ DocTags are integrated with Docling, which allows export to HTML, Markdown, and
|
|
265 |
url={https://arxiv.org/abs/2503.11576},
|
266 |
}
|
267 |
```
|
268 |
-
**Demo:** [Coming soon]
|
|
|
1 |
---
|
|
|
|
|
|
|
|
|
2 |
base_model:
|
3 |
- HuggingFaceTB/SmolVLM-256M-Instruct
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
library_name: transformers
|
7 |
+
license: apache-2.0
|
8 |
pipeline_tag: image-text-to-text
|
9 |
---
|
10 |
|
|
|
16 |
</div>
|
17 |
</div>
|
18 |
|
19 |
+
This model was presented in the paper [SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion](https://huggingface.co/papers/2503.11576).
|
20 |
+
|
21 |
### 🚀 Features:
|
22 |
- 🏷️ **DocTags for Efficient Tokenization** – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with **DoclingDocuments**.
|
23 |
- 🔍 **OCR (Optical Character Recognition)** – Extracts text accurately from images.
|
|
|
41 |
- 🧪 **Chemical Recognition**
|
42 |
- 📙 **Datasets**
|
43 |
|
|
|
44 |
## ⌨️ Get started (code examples)
|
45 |
|
46 |
You can use **transformers** or **vllm** to perform inference, and [Docling](https://github.com/docling-project/docling) to convert results to variety of ourput formats (md, html, etc.):
|
|
|
146 |
temperature=0.0,
|
147 |
max_tokens=8192)
|
148 |
|
149 |
+
chat_template = f"<|im_start|>User:<image>{PROMPT_TEXT}<end_of_utterance>
|
150 |
+
Assistant:"
|
151 |
|
152 |
image_files = sorted([f for f in os.listdir(IMAGE_DIR) if f.lower().endswith((".png", ".jpg", ".jpeg"))])
|
153 |
|
|
|
255 |
|
256 |
**Paper:** [arXiv](https://arxiv.org/abs/2503.11576)
|
257 |
|
258 |
+
**Project Page:** [HF中国镜像站](https://huggingface.co/ds4sd/SmolDocling-256M-preview)
|
259 |
+
|
260 |
**Citation:**
|
261 |
```
|
262 |
@misc{nassar2025smoldoclingultracompactvisionlanguagemodel,
|
|
|
269 |
url={https://arxiv.org/abs/2503.11576},
|
270 |
}
|
271 |
```
|
272 |
+
**Demo:** [Coming soon]
|