Post
2281
smolagents can see 🔥
we just shipped vision support to smolagents 🤗 agentic computers FTW
you can now:
💻 let the agent get images dynamically (e.g. agentic web browser)
📑 pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! 🤯
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) 🤠
read our blog http://hf.co/blog/smolagents-can-see
we just shipped vision support to smolagents 🤗 agentic computers FTW
you can now:
💻 let the agent get images dynamically (e.g. agentic web browser)
📑 pass images at the init of the agent (e.g. chatting with documents, filling forms automatically etc)
with few LoC change! 🤯
you can use transformers models locally (like Qwen2VL) OR plug-in your favorite multimodal inference provider (gpt-4o, antrophic & co) 🤠
read our blog http://hf.co/blog/smolagents-can-see