huggingface-projects (Huggingface Projects)

SEA-VL🔥 an OPEN dataset bridge the AI culture gap in Southeast Asia!

Paper: Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia (2503.07920)

✨1.28M+ culturally relevant images
✨85% accuracy in auto-collected images
✨Tracking underrepresented SEA languages & traditions

AdinaY

posted an update about 8 hours ago

Post

374

Open Sora 2.0 is out 🔥
hpcai-tech/open-sora-20-67cfb7efa80a73999ccfc2d5
✨ 11B with Apache2.0
✨ Low training cost - $200k
✨ open weights, code and training workflow

ThomasSimonini

updated a dataset about 18 hours ago

huggingface-projects/Deep-RL-Course-Certification

Viewer • Updated about 18 hours ago • 1.35k • 3.42k • 13

AdinaY

posted an update 1 day ago

Post

823

Spark TTS 🔊New OPEN TTS model that can generate any voice with just seconds of audio!

Released by SparkAudio community🔥

Model👉 SparkAudio/Spark-TTS-0.5B
Paper👉 Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens (2503.01710)

✨ Supports English & Chinese
✨ BiCodec Speech Codec: Enables precise voice control by separating semantics & speaker attributes

hysts

updated a Space 1 day ago

64

Gemma 3 12b It

🔥

Generate text responses from images and text input

AdinaY

posted an update 1 day ago

Post

896

R1-Omni🔥RLVR-Powered Multimodal LLM released by Alibaba

Model: StarJiaxing/R1-Omni-0.5B
Paper: R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning (2503.05379)

✨0.5B with Apache2.0
✨ Improve emotion recognition with visual and audio cues

1 reply

·

clefourrier

posted an update 1 day ago

Post

1094

Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.

Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**.
(Which everybody does, but people usually don't say)

For a tech report, it makes a lot of sense to report model performance when used optimally!
On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)

Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!

Because if your model knows its evals by heart, you're not testing for generalization.

hysts

published a Space 1 day ago

64

Gemma 3 12b It

🔥

Generate text responses from images and text input

freddyaboulton

posted an update 2 days ago

Post

1540

Privacy matters when talking to AI! 🔇

We've just added a microphone mute button to FastRTC in our latest update (v0.0.14). Now you control exactly what your LLM hears.

Plus lots more features in this release! Check them out:
https://github.com/freddyaboulton/fastrtc/releases/tag/0.0.14

Xenova

in huggingface-projects/Deep-RL-Course-Certification 7 days ago

huggingface-projects/Deep-RL-Course-Certification is broken

1

#3 opened 9 days ago by

francescosabbarese

I can't generate my certificate because of Runtime Error

3

#2 opened 11 days ago by

mrinaldi86

AdinaY

posted an update 7 days ago

Post

2239

Babel🗼A multilingual LLM supporting 25 languages, released by the Alibaba DAMO team.

Model: Tower-Babel/babel-67c172157372d4d6c4b4c6d5
Paper: Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers (2503.00865)

✨ 9B/83B chat & base
✨ Supports 25 languages: English, Chinese, Hindi, Spanish, Arabic, French, Bengali, Portuguese, Russian, Urdu, Indonesian, German, Japanese, Swahili, Filipino, Tamil, Vietnamese, Turkish, Italian, Javanese, Korean, Hausa, Persian, Thai, and Burmese

1 reply

·

andito

posted an update 8 days ago

Post

2423

Extremely bullish on @CohereForAI 's Aya Vision (8B & 32B) - new SOTA open-weight VLMs

- 8B wins up to 81% of the time in its class, better than Gemini Flash
- 32B beats Llama 3.2 90B!
- Covers 23 languages, excels in image captioning, VQA & more
- Integrated on transformers from Day 0!

Efficient multimodal models are here to stay!!🔥
Check out their blog! https://huggingface.co/blog/aya-vision

AdinaY

posted an update 9 days ago

Post

1655

Qilin 🔥a large scale multimodal dataset for search, recommendation and RAG research, released by Xiaohongshu & Tsinghua University

Dataset: THUIR/Qilin
Paper: Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions (2503.00501)

✨Multiple content modalities (text, images, video thumbnails)
✨Rich user interaction data ( from Xiaohongshu’s 300M+ MAUs, 70%+ search penetration)
✨Comprehensive evaluation metrics
✨Support for RAG system development

AdinaY

posted an update 9 days ago

Post

2749

CogView-4 is out🔥🚀 The SoTa OPEN text to image model by ZhipuAI

Model: THUDM/CogView4-6B
Demo: THUDM-HF-SPACE/CogView4

✨ 6B with Apache2.0
✨ Supports Chinese & English Prompts by ANY length
✨ Generate Chinese characters within images
✨ Creates images at any resolution within a given range

AdinaY

posted an update 10 days ago

Post

3990

Exciting releases from the Chinese community this February🔥
👉 zh-ai-community/2025-february-67a35aaa68e97812def5b6ef

MLLM:
✨ Ovis2 by Alibaba
AIDC-AI/ovis2-67ab36c7e497429034874464
✨ Step Audio Chat by StepFun AI
stepfun-ai/step-audio-67b33accf45735bb21131b0b

Audio:
✨ Step Audio TTS by StepFunAI
stepfun-ai/Step-Audio-TTS-3B
✨ InspireMusic by Alibaba
https://huggingface.co/FunAudioLLM
✨ Baichuan Audio by BaichuanAI
baichuan-inc/Baichuan-Audio-Instruct

Video:
✨ Wan2.1 by Alibaba_Wan
Wan-AI/Wan2.1-T2V-14B
✨ Stepvideo-T2V by StepFun AI
stepfun-ai/stepvideo-t2v
✨ SkyReels-V1 by Skywork
Skywork/skyreels-v1-67b34676ff65b4ec02d16307
✨ LLaDA-8B by RenminUniversity
GSAI-ML/LLaDA-8B-Instruct

MoE:
✨ Moonlight-16B by MoonshotAI (Kimi)
moonshotai/Moonlight-16B-A3B-Instruct

Reasoning:
✨ TinyR1-32B by Qihoo360
qihoo360/TinyR1-32B-Preview

Dataset:
✨ Chinese DeepSeek R1-Distill data -110k
Congliu/Chinese-DeepSeek-R1-Distill-data-110k

AdinaY

posted an update 13 days ago

Post

526

The AI race in the automotive industry is heating up🚗
Li Auto’s research team has released their latest paper on LLM👇 LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation (2502.18302)

✨This paper introduces LDGen, which integrates LLMs with diffusion models to enhance text-to-image (T2I) generation capabilities.

Huggingface Projects

AI & ML interests

Recent Activity

huggingface-projects's activity

Gemma 3 12b It

Update app.py

Update app.py

huggingface-projects/Deep-RL-Course-Certification

Gemma 3 12b It

Gemma 3 12b It

huggingface-projects/Deep-RL-Course-Certification is broken

I can't generate my certificate because of Runtime Error

AI & ML interests

Recent Activity

Team members 24

huggingface-projects's activity

Gemma 3 12b It

Update app.py

Update app.py

Gemma 3 12b It

Gemma 3 12b It

huggingface-projects/Deep-RL-Course-Certification is broken

I can't generate my certificate because of Runtime Error