ds4sd
/

SmolDocling-256M-preview

@@ -34,14 +34,16 @@ pipeline_tag: image-text-to-text
 - 📚 **Multi-Page & Full Document Conversion** – Coming Soon.
 - 🧪 **Chemical Recognition** – Coming Soon.
 ## How to get started
 You can use transformers or docling to perform inference:
 <details>
-<summary>Single image inference using Tranformers</summary>
 ```python
 # Prerequisites:
@@ -167,21 +169,50 @@ print(f"Total time: {time.time() - start_time:.2f} sec")
 DocTags create a clear and structured system of tags and rules that separate text from the document's structure. This makes things easier for Image-to-Sequence models by reducing confusion. On the other hand, converting directly to formats like HTML or Markdown can be messy—it often loses details, doesn’t clearly show the document’s layout, and increases the number of tokens, making processing less efficient.
 DocTags are integrated with Docling, which allows export to HTML, Markdown, and JSON. These exports can be offloaded to the CPU, reducing token generation overhead and improving efficiency.
-## Supported Instructions
-| Instruction | Description |
-| :---: | :---: |
-| Full conversion | Convert this page to docling. |
-| Chart | Convert chart to table (e.g., &lt;chart&gt;). |
-| Formula | Convert formula to LaTeX (e.g., &lt;formula&gt;). |
-| Code | Convert code to text (e.g., &lt;code&gt;). |
-| Table | Convert table to OTSL (e.g., &lt;otsl&gt;). OTSL: [Lysak et al., 2023](https://arxiv.org/pdf/2305.03393) |
-| No-Code Actions/Pipelines | OCR the text in a specific location: &lt;loc_155&gt;&lt;loc_233&gt;&lt;loc_206&gt;&lt;loc_237&gt; |
-|  | Identify element at: &lt;loc_247&gt;&lt;loc_482&gt;&lt;10c_252&gt;&lt;loc_486&gt; |
-|  | Find all 'text' elements on the page, retrieve all section headers. |
-|  | Detect footer elements on the page. |
-- More *Coming soon!* 🚧
 #### Model Summary
@@ -191,6 +222,6 @@ DocTags are integrated with Docling, which allows export to HTML, Markdown, and
 - **License:** Apache 2.0
 - **Finetuned from model:** Based on [Idefics3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (see technical summary)
-**Repository:** [More Information Needed]
-**Paper [optional]:** [More Information Needed]
-**Demo [optional]:** [More Information Needed]

 - 📚 **Multi-Page & Full Document Conversion** – Coming Soon.
 - 🧪 **Chemical Recognition** – Coming Soon.
+### 🚧 *Coming soon!*
+- 📊 **Better chart recognition 🛠️**
+- 📚 **One shot multi-page inference ⏱️**
 ## How to get started
 You can use transformers or docling to perform inference:
 <details>
+<summary>📄 Single page image inference using Tranformers 🤖</summary>
 ```python
 # Prerequisites:
 DocTags create a clear and structured system of tags and rules that separate text from the document's structure. This makes things easier for Image-to-Sequence models by reducing confusion. On the other hand, converting directly to formats like HTML or Markdown can be messy—it often loses details, doesn’t clearly show the document’s layout, and increases the number of tokens, making processing less efficient.
 DocTags are integrated with Docling, which allows export to HTML, Markdown, and JSON. These exports can be offloaded to the CPU, reducing token generation overhead and improving efficiency.
+## Supported Instructions
+<table>
+  <tr>
+    <td><b>Description</b></td>
+    <td><b>Instruction</b></td>
+  </tr>
+  <tr>
+    <td>Full conversion</td>
+    <td>Convert this page to docling.</td>
+  </tr>
+  <tr>
+    <td>Chart</td>
+    <td>Convert chart to table (e.g., &lt;chart&gt;).</td>
+  </tr>
+  <tr>
+    <td>Formula</td>
+    <td>Convert formula to LaTeX (e.g., &lt;formula&gt;).</td>
+  </tr>
+  <tr>
+    <td>Code</td>
+    <td>Convert code to text (e.g., &lt;code&gt;).</td>
+  </tr>
+  <tr>
+    <td>Table</td>
+    <td>Convert table to OTSL (e.g., &lt;otsl&gt;). OTSL: <a href="https://arxiv.org/pdf/2305.03393">Lysak et al., 2023</a></td>
+  </tr>
+  <tr>
+    <td>No-Code Actions/Pipelines</td>
+    <td>OCR the text in a specific location: &lt;loc_155&gt;&lt;loc_233&gt;&lt;loc_206&gt;&lt;loc_237&gt;</td>
+  </tr>
+  <tr>
+    <td></td>
+    <td>Identify element at: &lt;loc_247&gt;&lt;loc_482&gt;&lt;10c_252&gt;&lt;loc_486&gt;</td>
+  </tr>
+  <tr>
+    <td></td>
+    <td>Find all 'text' elements on the page, retrieve all section headers.</td>
+  </tr>
+  <tr>
+    <td></td>
+    <td>Detect footer elements on the page.</td>
+  </tr>
+</table>
 #### Model Summary
 - **License:** Apache 2.0
 - **Finetuned from model:** Based on [Idefics3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (see technical summary)
+**Repository:** [Docling](https://github.com/docling-project/docling)
+**Paper [optional]:** [Coming soon]
+**Demo [optional]:** [Coming soon]