--- base_model: - HuggingFaceTB/SmolVLM-256M-Instruct language: - en library_name: transformers license: cdla-permissive-2.0 pipeline_tag: image-text-to-text ---
SmolDocling is a multimodal Image-Text-to-Text model designed for efficient document conversion. It retains Docling's most popular features while ensuring full compatibility with Docling through seamless support for DoclingDocuments.
Description | Instruction | Comment |
Full conversion | Convert this page to docling. | DocTags represetation |
Chart | Convert chart to table. | (e.g., <chart>) |
Formula | Convert formula to LaTeX. | (e.g., <formula>) |
Code | Convert code to text. | (e.g., <code>) |
Table | Convert table to OTSL. | (e.g., <otsl>) OTSL: Lysak et al., 2023 |
Actions and Pipelines | OCR the text in a specific location: <loc_155><loc_233><loc_206><loc_237> | |
Identify element at: <loc_247><loc_482><10c_252><loc_486> | ||
Find all 'text' elements on the page, retrieve all section headers. | ||
Detect footer elements on the page. |