asnassar nielsr HF staff commited on
Commit
a6a2fd8
·
verified ·
1 Parent(s): 668322a

Add link to paper and project page (#3)

Browse files

- Add link to paper and project page (a357c910307598c6ffacd79d1fa2c68594bfa4cb)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +11 -7
README.md CHANGED
@@ -1,10 +1,10 @@
1
  ---
2
- library_name: transformers
3
- license: apache-2.0
4
- language:
5
- - en
6
  base_model:
7
  - HuggingFaceTB/SmolVLM-256M-Instruct
 
 
 
 
8
  pipeline_tag: image-text-to-text
9
  ---
10
 
@@ -16,6 +16,8 @@ pipeline_tag: image-text-to-text
16
  </div>
17
  </div>
18
 
 
 
19
  ### 🚀 Features:
20
  - 🏷️ **DocTags for Efficient Tokenization** – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with **DoclingDocuments**.
21
  - 🔍 **OCR (Optical Character Recognition)** – Extracts text accurately from images.
@@ -39,7 +41,6 @@ pipeline_tag: image-text-to-text
39
  - 🧪 **Chemical Recognition**
40
  - 📙 **Datasets**
41
 
42
-
43
  ## ⌨️ Get started (code examples)
44
 
45
  You can use **transformers** or **vllm** to perform inference, and [Docling](https://github.com/docling-project/docling) to convert results to variety of ourput formats (md, html, etc.):
@@ -145,7 +146,8 @@ sampling_params = SamplingParams(
145
  temperature=0.0,
146
  max_tokens=8192)
147
 
148
- chat_template = f"<|im_start|>User:<image>{PROMPT_TEXT}<end_of_utterance>\nAssistant:"
 
149
 
150
  image_files = sorted([f for f in os.listdir(IMAGE_DIR) if f.lower().endswith((".png", ".jpg", ".jpeg"))])
151
 
@@ -253,6 +255,8 @@ DocTags are integrated with Docling, which allows export to HTML, Markdown, and
253
 
254
  **Paper:** [arXiv](https://arxiv.org/abs/2503.11576)
255
 
 
 
256
  **Citation:**
257
  ```
258
  @misc{nassar2025smoldoclingultracompactvisionlanguagemodel,
@@ -265,4 +269,4 @@ DocTags are integrated with Docling, which allows export to HTML, Markdown, and
265
  url={https://arxiv.org/abs/2503.11576},
266
  }
267
  ```
268
- **Demo:** [Coming soon]
 
1
  ---
 
 
 
 
2
  base_model:
3
  - HuggingFaceTB/SmolVLM-256M-Instruct
4
+ language:
5
+ - en
6
+ library_name: transformers
7
+ license: apache-2.0
8
  pipeline_tag: image-text-to-text
9
  ---
10
 
 
16
  </div>
17
  </div>
18
 
19
+ This model was presented in the paper [SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion](https://huggingface.co/papers/2503.11576).
20
+
21
  ### 🚀 Features:
22
  - 🏷️ **DocTags for Efficient Tokenization** – Introduces DocTags an efficient and minimal representation for documents that is fully compatible with **DoclingDocuments**.
23
  - 🔍 **OCR (Optical Character Recognition)** – Extracts text accurately from images.
 
41
  - 🧪 **Chemical Recognition**
42
  - 📙 **Datasets**
43
 
 
44
  ## ⌨️ Get started (code examples)
45
 
46
  You can use **transformers** or **vllm** to perform inference, and [Docling](https://github.com/docling-project/docling) to convert results to variety of ourput formats (md, html, etc.):
 
146
  temperature=0.0,
147
  max_tokens=8192)
148
 
149
+ chat_template = f"<|im_start|>User:<image>{PROMPT_TEXT}<end_of_utterance>
150
+ Assistant:"
151
 
152
  image_files = sorted([f for f in os.listdir(IMAGE_DIR) if f.lower().endswith((".png", ".jpg", ".jpeg"))])
153
 
 
255
 
256
  **Paper:** [arXiv](https://arxiv.org/abs/2503.11576)
257
 
258
+ **Project Page:** [HF中国镜像站](https://huggingface.co/ds4sd/SmolDocling-256M-preview)
259
+
260
  **Citation:**
261
  ```
262
  @misc{nassar2025smoldoclingultracompactvisionlanguagemodel,
 
269
  url={https://arxiv.org/abs/2503.11576},
270
  }
271
  ```
272
+ **Demo:** [Coming soon]