--- library_name: transformers license: apache-2.0 language: - en base_model: - HuggingFaceTB/SmolVLM-256M-Instruct pipeline_tag: image-text-to-text ---
SmolDocling is a multimodal Image-Text-to-Text model designed for efficient document conversion. It retains Docling's most popular features while ensuring full compatibility with Docling through seamless support for DoclingDocuments.
Description | Instruction |
Full conversion | Convert this page to docling. |
Chart | Convert chart to table (e.g., <chart>). |
Formula | Convert formula to LaTeX (e.g., <formula>). |
Code | Convert code to text (e.g., <code>). |
Table | Convert table to OTSL (e.g., <otsl>). OTSL: Lysak et al., 2023 |
No-Code Actions/Pipelines | OCR the text in a specific location: <loc_155><loc_233><loc_206><loc_237> |
Identify element at: <loc_247><loc_482><10c_252><loc_486> | |
Find all 'text' elements on the page, retrieve all section headers. | |
Detect footer elements on the page. |