somosnlp (SomosNLP)

lewtun

authored a paper 2 days ago

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published 4 days ago • 31

lewtun

posted an update 3 days ago

Post

1823

Introducing OlympicCoder: a series of open reasoning models that can solve olympiad-level programming problems 🧑‍💻

- 7B open-r1/OlympicCoder-7B
- 32B open-r1/OlympicCoder-32B

We find that OlympicCoder models outperform Claude 3.7 Sonnet, as well as others over 100x larger 💪

Together with the models, we are releasing:

📊CodeForces-CoTs: new dataset of code problems from the most popular competitive coding platform, with R1 traces in C++ and Python open-r1/codeforces-cots

🏆 IOI'2024: a new benchmark of VERY hard programming problems where even frontier models struggle to match human performance open-r1/ioi

For links to the models and datasets, check out our latest progress report from Open R1: https://huggingface.co/blog/open-r1/update-3

1 reply

·

albertvillanova

posted an update 7 days ago

Post

3528

🚀 New smolagents update: Safer Local Python Execution! 🦾🐍

With the latest release, we've added security checks to the local Python interpreter: every evaluation is now analyzed for dangerous builtins, modules, and functions. 🔒

Here's why this matters & what you need to know! 🧵👇

1️⃣ Why is local execution risky? ⚠️
AI agents that run arbitrary Python code can unintentionally (or maliciously) access system files, run unsafe commands, or exfiltrate data.

2️⃣ New Safety Layer in smolagents 🛡️
We now inspect every return value during execution:
✅ Allowed: Safe built-in types (e.g., numbers, strings, lists)
⛔ Blocked: Dangerous functions/modules (e.g., os.system, subprocess, exec, shutil)

3️⃣ Immediate Benefits 💡
- Prevent agents from accessing unsafe builtins
- Block unauthorized file or network access
- Reduce accidental security vulnerabilities

4️⃣ Security Disclaimer ⚠️
🚨 Despite these improvements, local Python execution is NEVER 100% safe. 🚨
If you need true isolation, use a remote sandboxed executor like Docker or E2B.

5️⃣ The Best Practice: Use Sandboxed Execution 🔐
For production-grade AI agents, we strongly recommend running code in a Docker or E2B sandbox to ensure complete isolation.

6️⃣ Upgrade Now & Stay Safe! 🚀
Check out the latest smolagents release and start building safer AI agents today.

🔗 https://github.com/huggingface/smolagents

What security measures do you take when running AI-generated code? Let’s discuss! 👇

#AI #smolagents #Python #Security

2 replies

·

albertvillanova

posted an update 8 days ago

Post

3772

🚀 Big news for AI agents! With the latest release of smolagents, you can now securely execute Python code in sandboxed Docker or E2B environments. 🦾🔒

Here's why this is a game-changer for agent-based systems: 🧵👇

1️⃣ Security First 🔐
Running AI agents in unrestricted Python environments is risky! With sandboxing, your agents are isolated, preventing unintended file access, network abuse, or system modifications.

2️⃣ Deterministic & Reproducible Runs 📦
By running agents in containerized environments, you ensure that every execution happens in a controlled and predictable setting—no more environment mismatches or dependency issues!

3️⃣ Resource Control & Limits 🚦
Docker and E2B allow you to enforce CPU, memory, and execution time limits, so rogue or inefficient agents don’t spiral out of control.

4️⃣ Safer Code Execution in Production 🏭
Deploy AI agents confidently, knowing that any generated code runs in an ephemeral, isolated environment, protecting your host machine and infrastructure.

5️⃣ Easy to Integrate 🛠️
With smolagents, you can simply configure your agent to use Docker or E2B as its execution backend—no need for complex security setups!

6️⃣ Perfect for Autonomous AI Agents 🤖
If your AI agents generate and execute code dynamically, this is a must-have to avoid security pitfalls while enabling advanced automation.

⚡ Get started now: https://github.com/huggingface/smolagents

What will you build with smolagents? Let us know! 🚀💡

davidberenstein1957

posted an update 8 days ago

Post

2221

🔥 Text2SQL, explore and share any data analysis!

🤗 HF中国镜像站 - Dataset Studio is an amazing new feature.

🚀 Start yourself: fka/awesome-chatgpt-prompts

📺 YouTube: https://youtu.be/5LUZq7MHolA?feature=shared

davidberenstein1957

posted an update 10 days ago

Post

4169

🥊 Epic Agent Framework Showdown! Available today!

🔵 In the blue corner, the versatile challenger with a proven track record of knowledge retrieval: LlamaIndex!

🛑 In the red corner, the defender, weighing in with lightweight efficiency: HF中国镜像站 smolagents!

🔗 URL: https://huggingface.co/agents-course

We just published the LlamaIndex unit for the agents course, and it is set to offer a great contrast between the smolagents unit by looking at

- What makes llama-index stand-out
- How the LlamaHub is used for integrations
- Creating QueryEngine components
- Using agents and tools
- Agentic and multi-agent workflows

The team has been working flat-out on this for a few weeks. Supported by Logan Markewich and Laurie Voss over at LlamaIndex.

Who won? You decide!

davidberenstein1957

posted an update 10 days ago

Post

2979

🫸 New release to push vector search to the Hub with vicinity and work with any serialisable objects.

🧑‍🏫 KNN, HNSW, USEARCH, ANNOY, PYNNDESCENT, FAISS, and VOYAGER.

🔗 Example Repo: minishlab/my-vicinity-repo

davidberenstein1957

posted an update about 1 month ago

Post

3288

🚀 Find banger tools for your smolagents!

I created the Tools gallery, which makes tools specifically developed by/for smolagents searchable and visible. This will help with:
- inspiration
- best practices
- finding cool tools

Space: davidberenstein1957/smolagents-and-tools

1 reply

·

lewtun

posted an update about 1 month ago

Post

4878

Introducing OpenR1-Math-220k!

open-r1/OpenR1-Math-220k

The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch 💪

What’s new compared to existing reasoning datasets?

♾ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.

🐳 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.

📀 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.

⏳ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that can’t be verified with a rules-based parser)

📊 We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.

🔎 Read our blog post for all the nitty gritty details: https://huggingface.co/blog/open-r1/update-2

davidberenstein1957

posted an update about 1 month ago

Post

2474

Fine-tune Deepseek-R1 with a Synthetic Reasoning Dataset

Blog: https://huggingface.co/blog/sdiazlor/fine-tune-deepseek-with-a-synthetic-reasoning-data

lewtun

authored a paper about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 203

davidberenstein1957

posted an update about 1 month ago

Post

2072

Agentic RAG: Applied, visual, and step-by-step! 🐾

Get familiar with the Agents and tools, not the bells and whistles!

Retrieve - Augment and now GENERATE.

part 3: https://huggingface.co/blog/davidberenstein1957/ai-blueprint-agentic-rag-part-3-generate

davidberenstein1957

posted an update about 1 month ago

Post

2877

Anyone can create free hosted tools for their AI agents! 🔥

Agentic RAG stack part 2 - augment
Augment retrieval results by reranking optimises content without increasing time too much

part2: https://huggingface.co/blog/davidberenstein1957/ai-blueprint-agentic-rag-part-2-augment
code: https://github.com/huggingface/ai-blueprint

albertvillanova

posted an update about 1 month ago

Post

3776

🚀 Introducing @huggingface Open Deep-Research💥

In just 24 hours, we built an open-source agent that:
✅ Autonomously browse the web
✅ Search, scroll & extract info
✅ Download & manipulate files
✅ Run calculations on data

55% on GAIA validation set! Help us improve it!💡
https://huggingface.co/blog/open-deep-research

3 replies

·

mariagrandury

updated a collection about 1 month ago

Corpus: Evaluation datasets for ES & LATAM

Collection

Corpus of La Leaderboard, the open LLM leaderboard for ES & LATAM • 56 items • Updated Feb 5 • 4

tadeodonegana

posted an update about 1 month ago

Post

1148

At RooMix(dot)ai we’re looking for an expert in generative image models for a short consulting gig. Any recommendations?

1 reply

·

davidberenstein1957

posted an update about 1 month ago

Post

2007

Creating an agentic RAG stack on the HF中国镜像站 Hub - part 1 - retrieval (1/5).

🚀 Web apps and microservices included!

Chunk, embed and index documents at a huge scale without overhead.

Blog: https://huggingface.co/blog/davidberenstein1957/ai-blueprint-agentic-rag-part-1-retrieve

davidberenstein1957

posted an update about 1 month ago

Post

1631

tldr; Parquet is awesome, DuckDB too!

Datasets on the HF中国镜像站 Hub rely on parquet files. We can interact with these files using DuckDB as a fast in-memory database system. One of DuckDB’s features is vector similarity search which can be used with or without an index.

blog:
https://huggingface.co/learn/cookbook/vector_search_with_hub_as_backend