Image-Text-to-Text
Transformers
ONNX
Safetensors
English
idefics3
conversational

How to get coordinates from doctags to draw bounding box in the image?

#18
by deewalia20 - opened

For given doctag:
How to decide the Coordinates of the bounding box (x, y, width, height) x, y, w, h?

Docling org

@deewalia20 it's x1, y1,x2,y2, with top-left is origin.
Each Doctag loc_ coordinate designates proportion from page image width / height in the range from 0..500

Thanks I got it now.
For a given Doctag location info: <loc_proportional_x1><loc_proportional_y1><loc_x2><loc_y2>, use the below formula where image_width and image_height are from source image.
Then use opencv or pillow to plot bounding box.

To convert these proportional coordinates to actual pixel values, you can use the following formulas:

actual_x1=(proportional_x1500)×image_width \text{actual\_x1} = \left( \frac{\text{proportional\_x1}}{500} \right) \times \text{image\_width}

actual_y1=(proportional_y1500)×image_height \text{actual\_y1} = \left( \frac{\text{proportional\_y1}}{500} \right) \times \text{image\_height}

deewalia20 changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment