MaxMnemonic commited on
Commit
904b2d3
·
verified ·
1 Parent(s): 3e8f36f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -20
README.md CHANGED
@@ -34,14 +34,16 @@ pipeline_tag: image-text-to-text
34
  - 📚 **Multi-Page & Full Document Conversion** – Coming Soon.
35
  - 🧪 **Chemical Recognition** – Coming Soon.
36
 
37
-
 
 
38
 
39
  ## How to get started
40
 
41
  You can use transformers or docling to perform inference:
42
 
43
  <details>
44
- <summary>Single image inference using Tranformers</summary>
45
 
46
  ```python
47
  # Prerequisites:
@@ -167,21 +169,50 @@ print(f"Total time: {time.time() - start_time:.2f} sec")
167
  DocTags create a clear and structured system of tags and rules that separate text from the document's structure. This makes things easier for Image-to-Sequence models by reducing confusion. On the other hand, converting directly to formats like HTML or Markdown can be messy—it often loses details, doesn’t clearly show the document’s layout, and increases the number of tokens, making processing less efficient.
168
  DocTags are integrated with Docling, which allows export to HTML, Markdown, and JSON. These exports can be offloaded to the CPU, reducing token generation overhead and improving efficiency.
169
 
170
- ## Supported Instructions
171
- | Instruction | Description |
172
- | :---: | :---: |
173
- | Full conversion | Convert this page to docling. |
174
- | Chart | Convert chart to table (e.g., &lt;chart&gt;). |
175
- | Formula | Convert formula to LaTeX (e.g., &lt;formula&gt;). |
176
- | Code | Convert code to text (e.g., &lt;code&gt;). |
177
- | Table | Convert table to OTSL (e.g., &lt;otsl&gt;). OTSL: [Lysak et al., 2023](https://arxiv.org/pdf/2305.03393) |
178
- | No-Code Actions/Pipelines | OCR the text in a specific location: &lt;loc_155&gt;&lt;loc_233&gt;&lt;loc_206&gt;&lt;loc_237&gt; |
179
- | | Identify element at: &lt;loc_247&gt;&lt;loc_482&gt;&lt;10c_252&gt;&lt;loc_486&gt; |
180
- | | Find all 'text' elements on the page, retrieve all section headers. |
181
- | | Detect footer elements on the page. |
182
-
183
-
184
- - More *Coming soon!* 🚧
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
185
 
186
  #### Model Summary
187
 
@@ -191,6 +222,6 @@ DocTags are integrated with Docling, which allows export to HTML, Markdown, and
191
  - **License:** Apache 2.0
192
  - **Finetuned from model:** Based on [Idefics3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (see technical summary)
193
 
194
- **Repository:** [More Information Needed]
195
- **Paper [optional]:** [More Information Needed]
196
- **Demo [optional]:** [More Information Needed]
 
34
  - 📚 **Multi-Page & Full Document Conversion** – Coming Soon.
35
  - 🧪 **Chemical Recognition** – Coming Soon.
36
 
37
+ ### 🚧 *Coming soon!*
38
+ - 📊 **Better chart recognition 🛠️**
39
+ - 📚 **One shot multi-page inference ⏱️**
40
 
41
  ## How to get started
42
 
43
  You can use transformers or docling to perform inference:
44
 
45
  <details>
46
+ <summary>📄 Single page image inference using Tranformers 🤖</summary>
47
 
48
  ```python
49
  # Prerequisites:
 
169
  DocTags create a clear and structured system of tags and rules that separate text from the document's structure. This makes things easier for Image-to-Sequence models by reducing confusion. On the other hand, converting directly to formats like HTML or Markdown can be messy—it often loses details, doesn’t clearly show the document’s layout, and increases the number of tokens, making processing less efficient.
170
  DocTags are integrated with Docling, which allows export to HTML, Markdown, and JSON. These exports can be offloaded to the CPU, reducing token generation overhead and improving efficiency.
171
 
172
+ ## Supported Instructions
173
+
174
+ <table>
175
+ <tr>
176
+ <td><b>Description</b></td>
177
+ <td><b>Instruction</b></td>
178
+ </tr>
179
+ <tr>
180
+ <td>Full conversion</td>
181
+ <td>Convert this page to docling.</td>
182
+ </tr>
183
+ <tr>
184
+ <td>Chart</td>
185
+ <td>Convert chart to table (e.g., &lt;chart&gt;).</td>
186
+ </tr>
187
+ <tr>
188
+ <td>Formula</td>
189
+ <td>Convert formula to LaTeX (e.g., &lt;formula&gt;).</td>
190
+ </tr>
191
+ <tr>
192
+ <td>Code</td>
193
+ <td>Convert code to text (e.g., &lt;code&gt;).</td>
194
+ </tr>
195
+ <tr>
196
+ <td>Table</td>
197
+ <td>Convert table to OTSL (e.g., &lt;otsl&gt;). OTSL: <a href="https://arxiv.org/pdf/2305.03393">Lysak et al., 2023</a></td>
198
+ </tr>
199
+ <tr>
200
+ <td>No-Code Actions/Pipelines</td>
201
+ <td>OCR the text in a specific location: &lt;loc_155&gt;&lt;loc_233&gt;&lt;loc_206&gt;&lt;loc_237&gt;</td>
202
+ </tr>
203
+ <tr>
204
+ <td></td>
205
+ <td>Identify element at: &lt;loc_247&gt;&lt;loc_482&gt;&lt;10c_252&gt;&lt;loc_486&gt;</td>
206
+ </tr>
207
+ <tr>
208
+ <td></td>
209
+ <td>Find all 'text' elements on the page, retrieve all section headers.</td>
210
+ </tr>
211
+ <tr>
212
+ <td></td>
213
+ <td>Detect footer elements on the page.</td>
214
+ </tr>
215
+ </table>
216
 
217
  #### Model Summary
218
 
 
222
  - **License:** Apache 2.0
223
  - **Finetuned from model:** Based on [Idefics3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (see technical summary)
224
 
225
+ **Repository:** [Docling](https://github.com/docling-project/docling)
226
+ **Paper [optional]:** [Coming soon]
227
+ **Demo [optional]:** [Coming soon]