Vectara

company

https://vectara.com

vectara

Activity Feed Request to join this org

AI & ML interests

retrieval augmented generation, grounded generation, large language models, LLMs, question answering, chatbot

Recent Activity

stsui96 updated a dataset about 17 hours ago

vectara/leaderboard_results

stsui96 updated a Space 1 day ago

vectara/leaderboard

joao-vectara updated a Space 3 days ago

vectara/STCBank

View all activity

vectara's activity

stsui96

updated a dataset about 17 hours ago

vectara/leaderboard_results

Viewer • Updated about 17 hours ago • 103k • 412 • 2

stsui96

updated a Space 1 day ago

114

HHEM Leaderboard

🥇

Browse and submit language model benchmarks

clefourrier

posted an update 2 days ago

Post

1536

Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.

Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)

For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)

Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!

Because if your model knows its evals by heart, you're not testing for generalization.

joao-vectara

updated a Space 3 days ago

STCBank

🦀

Upload documents to your digital bank assistant

joao-vectara

published a Space 8 days ago

Clinical Trials Assistant

👨

Clinical Trial assistant using vectara-agentic

ofermend

published a Space 9 days ago

UCSF Ortho Demo

🐨

Ask questions about UCSF Orthopedics

joao-vectara

published a Space 12 days ago

STCBank

🦀

Upload documents to your digital bank assistant

nthakur

authored a paper 18 days ago

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published 23 days ago • 32

clefourrier

authored a paper about 1 month ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 203

clefourrier

authored a paper 3 months ago

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published Dec 4, 2024 • 18

nthakur

authored a paper 4 months ago

MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems

Paper • 2410.13716 • Published Oct 17, 2024

nthakur

authored a paper 7 months ago

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

Paper • 2406.16828 • Published Jun 24, 2024

ofermend

posted an update 7 months ago

Post

1830

I'm excited to share our updated hallucination evaluation model (called HHEM-2.1-Open) as well as the updated leaderboard that ranks LLM by the propensity to hallucinate.

vectara/Hallucination-evaluation-leaderboard

1 reply

clefourrier

authored 2 papers 9 months ago

The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models

Paper • 2404.05904 • Published Apr 8, 2024 • 9

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 192

ofermend

posted an update 11 months ago

Post

1756

If you are a debate fan or did this as an extracurricular activity as a kid, you might have fun with this demo - debate bot. Debate against AI/RAG:

vectara/debate-bot

4 replies

nthakur

posted an update 11 months ago

Post

3405

🦢 The SWIM-IR dataset contains 29 million text-retrieval training pairs across 27 diverse languages. It is one of the largest synthetic multilingual datasets generated using PaLM 2 on Wikipedia! 🔥🔥

SWIM-IR dataset contains three subsets :
- Cross-lingual:nthakur/swim-ir-cross-lingual
- Monolingual: nthakur/swim-ir-monolingual
- Indic Cross-lingual: nthakur/indic-swim-ir-cross-lingual

Check it out:
https://huggingface.co/collections/nthakur/swim-ir-dataset-662ddaecfc20896bf14dd9b7

clefourrier

posted an update 11 months ago

Post

6119

In a basic chatbots, errors are annoyances. In medical LLMs, errors can have life-threatening consequences 🩸

It's therefore vital to benchmark/follow advances in medical LLMs before even thinking about deployment.

This is why a small research team introduced a medical LLM leaderboard, to get reproducible and comparable results between LLMs, and allow everyone to follow advances in the field.

openlifescienceai/open_medical_llm_leaderboard

Congrats to @aaditya and @pminervini !
Learn more in the blog: https://huggingface.co/blog/leaderboard-medicalllm

clefourrier

posted an update 11 months ago

Post

4764

Contamination free code evaluations with LiveCodeBench! 🖥️

LiveCodeBench is a new leaderboard, which contains:
- complete code evaluations (on code generation, self repair, code execution, tests)
- my favorite feature: problem selection by publication date 📅

This feature means that you can get model scores averaged only on new problems out of the training data. This means... contamination free code evals! 🚀

Check it out!

Blog: https://huggingface.co/blog/leaderboard-livecodebench
Leaderboard: livecodebench/leaderboard

Congrats to @StringChaos @minimario @xu3kev @kingh0730 and @FanjiaYan for the super cool leaderboard!

clefourrier

posted an update 11 months ago

Post

2236

🆕 Evaluate your RL agents - who's best at Atari?🏆

The new RL leaderboard evaluates agents in 87 possible environments (from Atari 🎮 to motion control simulations🚶and more)!

When you submit your model, it's run and evaluated in real time - and the leaderboard displays small videos of the best model's run, which is super fun to watch! ✨

Kudos to @qgallouedec for creating and maintaining the leaderboard!
Let's find out which agent is the best at games! 🚀

open-rl-leaderboard/leaderboard

AI & ML interests

Recent Activity

Team members 32

vectara's activity

HHEM Leaderboard

STCBank

Clinical Trials Assistant

UCSF Ortho Demo

STCBank