asnassar commited on
Commit
ee67c58
·
verified ·
1 Parent(s): 3d3bb6a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -162
README.md CHANGED
@@ -10,15 +10,6 @@ pipeline_tag: image-text-to-text
10
 
11
  ### SmolDocling-256M-preview
12
 
13
- SmolDocling is a multimodal Image-Text-to-Text model that features
14
-
15
-
16
- ## Model Details
17
-
18
- ### Model Description
19
-
20
- ### SmolDocling-256M-preview
21
-
22
  SmolDocling is a multimodal Image-Text-to-Text model designed for efficient document conversion. It retains Docling's most popular features while ensuring full compatibility with Docling through seamless support for **DoclingDocuments**.
23
 
24
  ### 🚀 Features:
@@ -51,167 +42,54 @@ SmolDocling is a multimodal Image-Text-to-Text model designed for efficient docu
51
  - **Finetuned from model:** Based on [Idefics3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (see technical summary)
52
 
53
 
54
- ## Uses
55
-
56
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
57
-
58
- ### Direct Use
59
-
60
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Downstream Use [optional]
65
-
66
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
67
-
68
- [More Information Needed]
69
-
70
- ### Out-of-Scope Use
71
-
72
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
73
-
74
- [More Information Needed]
75
-
76
- ## Bias, Risks, and Limitations
77
-
78
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
79
-
80
- [More Information Needed]
81
-
82
- ### Recommendations
83
-
84
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
85
-
86
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
87
-
88
- ## How to Get Started with the Model
89
-
90
- Use the code below to get started with the model.
91
-
92
- [More Information Needed]
93
-
94
- ## Training Details
95
-
96
- ### Training Data
97
-
98
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
99
-
100
- [More Information Needed]
101
-
102
- ### Training Procedure
103
-
104
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
105
-
106
- #### Preprocessing [optional]
107
-
108
- [More Information Needed]
109
-
110
-
111
- #### Training Hyperparameters
112
-
113
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
114
-
115
- #### Speeds, Sizes, Times [optional]
116
-
117
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
118
-
119
- [More Information Needed]
120
-
121
- ## Evaluation
122
-
123
- <!-- This section describes the evaluation protocols and provides the results. -->
124
-
125
- ### Testing Data, Factors & Metrics
126
-
127
- #### Testing Data
128
-
129
- <!-- This should link to a Dataset Card if possible. -->
130
-
131
- [More Information Needed]
132
-
133
- #### Factors
134
-
135
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
136
-
137
- [More Information Needed]
138
-
139
- #### Metrics
140
-
141
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
142
-
143
- [More Information Needed]
144
-
145
- ### Results
146
-
147
- [More Information Needed]
148
-
149
- #### Summary
150
-
151
-
152
-
153
- ## Model Examination [optional]
154
-
155
- <!-- Relevant interpretability work for the model goes here -->
156
-
157
- [More Information Needed]
158
-
159
- ## Environmental Impact
160
-
161
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
162
-
163
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
164
-
165
- - **Hardware Type:** [More Information Needed]
166
- - **Hours used:** [More Information Needed]
167
- - **Cloud Provider:** [More Information Needed]
168
- - **Compute Region:** [More Information Needed]
169
- - **Carbon Emitted:** [More Information Needed]
170
-
171
- ## Technical Specifications [optional]
172
-
173
- ### Model Architecture and Objective
174
-
175
- [More Information Needed]
176
-
177
- ### Compute Infrastructure
178
-
179
- [More Information Needed]
180
-
181
- #### Hardware
182
-
183
- [More Information Needed]
184
-
185
- #### Software
186
-
187
- [More Information Needed]
188
-
189
- ## Citation [optional]
190
-
191
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
192
-
193
- **BibTeX:**
194
-
195
- [More Information Needed]
196
 
197
- **APA:**
198
 
199
- [More Information Needed]
200
 
201
- ## Glossary [optional]
202
 
203
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 
 
 
 
204
 
205
- [More Information Needed]
206
 
207
- ## More Information [optional]
 
208
 
209
- [More Information Needed]
 
 
 
 
 
 
210
 
211
- ## Model Card Authors [optional]
 
 
 
 
 
 
 
 
 
212
 
213
- [More Information Needed]
 
 
 
214
 
215
- ## Model Card Contact
 
 
 
 
 
216
 
217
- [More Information Needed]
 
 
10
 
11
  ### SmolDocling-256M-preview
12
 
 
 
 
 
 
 
 
 
 
13
  SmolDocling is a multimodal Image-Text-to-Text model designed for efficient document conversion. It retains Docling's most popular features while ensuring full compatibility with Docling through seamless support for **DoclingDocuments**.
14
 
15
  ### 🚀 Features:
 
42
  - **Finetuned from model:** Based on [Idefics3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (see technical summary)
43
 
44
 
45
+ ### How to get started
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
+ You can use transformers or docling to perform inference:
48
 
49
+ # Transformers:
50
 
 
51
 
52
+ ```python
53
+ import torch
54
+ from PIL import Image
55
+ from transformers import AutoProcessor, AutoModelForVision2Seq
56
+ from transformers.image_utils import load_image
57
 
58
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
59
 
60
+ # Load images
61
+ image = load_image("https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg")
62
 
63
+ # Initialize processor and model
64
+ processor = AutoProcessor.from_pretrained("ds4sd/SmolDocling-256M-preview")
65
+ model = AutoModelForVision2Seq.from_pretrained(
66
+ "ds4sd/SmolDocling-256M-preview",
67
+ torch_dtype=torch.bfloat16,
68
+ _attn_implementation="flash_attention_2" if DEVICE == "cuda" else "eager",
69
+ ).to(DEVICE)
70
 
71
+ # Create input messages
72
+ messages = [
73
+ {
74
+ "role": "user",
75
+ "content": [
76
+ {"type": "image"},
77
+ {"type": "text", "text": "Convert this page to docling."}
78
+ ]
79
+ },
80
+ ]
81
 
82
+ # Prepare inputs
83
+ prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
84
+ inputs = processor(text=prompt, images=[image], return_tensors="pt")
85
+ inputs = inputs.to(DEVICE)
86
 
87
+ # Generate outputs
88
+ generated_ids = model.generate(**inputs, max_new_tokens=500)
89
+ generated_texts = processor.batch_decode(
90
+ generated_ids,
91
+ skip_special_tokens=True,
92
+ )
93
 
94
+ print(generated_texts[0])
95
+ """