asnassar commited on
Commit
1074b94
·
verified ·
1 Parent(s): cd2c4cb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -3
README.md CHANGED
@@ -40,16 +40,57 @@ pipeline_tag: image-text-to-text
40
  You can use transformers or docling to perform inference:
41
 
42
  <details>
43
- <summary>Inference using Docling</summary>
44
 
45
  ```python
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  print(generated_texts[0])
48
  ```
49
  </details>
50
 
 
51
  <details>
52
- <summary>Single image inference using Tranformers</summary>
53
 
54
  ```python
55
  import torch
@@ -94,7 +135,7 @@ generated_texts = processor.batch_decode(
94
  )
95
 
96
  print(generated_texts[0])
97
- ```
98
  </details>
99
 
100
  <details>
 
40
  You can use transformers or docling to perform inference:
41
 
42
  <details>
43
+ <summary>Single image inference using Tranformers</summary>
44
 
45
  ```python
46
+ import torch
47
+ from PIL import Image
48
+ from transformers import AutoProcessor, AutoModelForVision2Seq
49
+ from transformers.image_utils import load_image
50
+
51
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
52
+
53
+ # Load images
54
+ image = load_image("https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg")
55
+
56
+ # Initialize processor and model
57
+ processor = AutoProcessor.from_pretrained("ds4sd/SmolDocling-256M-preview")
58
+ model = AutoModelForVision2Seq.from_pretrained(
59
+ "ds4sd/SmolDocling-256M-preview",
60
+ torch_dtype=torch.bfloat16,
61
+ _attn_implementation="flash_attention_2" if DEVICE == "cuda" else "eager",
62
+ ).to(DEVICE)
63
+
64
+ # Create input messages
65
+ messages = [
66
+ {
67
+ "role": "user",
68
+ "content": [
69
+ {"type": "image"},
70
+ {"type": "text", "text": "Convert this page to docling."}
71
+ ]
72
+ },
73
+ ]
74
+
75
+ # Prepare inputs
76
+ prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
77
+ inputs = processor(text=prompt, images=[image], return_tensors="pt")
78
+ inputs = inputs.to(DEVICE)
79
+
80
+ # Generate outputs
81
+ generated_ids = model.generate(**inputs, max_new_tokens=500)
82
+ generated_texts = processor.batch_decode(
83
+ generated_ids,
84
+ skip_special_tokens=True,
85
+ )
86
 
87
  print(generated_texts[0])
88
  ```
89
  </details>
90
 
91
+
92
  <details>
93
+ <summary>Multi-page image inference using Tranformers</summary>
94
 
95
  ```python
96
  import torch
 
135
  )
136
 
137
  print(generated_texts[0])
138
+ ``````
139
  </details>
140
 
141
  <details>