26 44 52

Steven Zheng

Steveeeeeeen

AI & ML interests

speech & audio

Recent Activity

updated a dataset about 22 hours ago

Steveeeeeeen/mls_eng_10k

liked a Space about 24 hours ago

sesame/csm-1b

updated a dataset 1 day ago

Steveeeeeeen/whisper-leaderboard-evals

View all activity

Organizations

Steveeeeeeen's activity

updated a dataset about 22 hours ago

Steveeeeeeen/mls_eng_10k

Updated about 22 hours ago • 47

liked a Space about 24 hours ago

295

Sesame CSM

🌱

Conversational speech generation

updated a dataset 1 day ago

Steveeeeeeen/whisper-leaderboard-evals

Preview • Updated 1 day ago • 440 • 1

upvoted 2 articles 1 day ago

Article

Open-R1: Update #1

and 7 others •

Feb 2

• 295

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 808

updated a model 3 days ago

Steveeeeeeen/Phi-4-multimodal-Emilia-French

Text Generation • Updated 3 days ago • 6

published a model 3 days ago

Steveeeeeeen/Phi-4-multimodal-Emilia-French

Text Generation • Updated 3 days ago • 6

reacted to eliebak's post with 🔥 3 days ago

Post

1356

Google just dropped an exciting technical report for the brand-new Gemma3 model! 🚀 Here are my personal notes highlighting the most intriguing architectural innovations, design choices, and insights from this release:

1) Architecture choices:
> No more softcaping, replace by QK-Norm
> Both Pre AND Post Norm
> Wider MLP than Qwen2.5, ~ same depth
> SWA with 5:1 and 1024 (very small and cool ablation on the paper!)
> No MLA to save KV cache, SWA do the job!

2) Long context
> Only increase the rope in the global layer (to 1M)
> Confirmation that it's harder to do long context for smol models, no 128k for the 1B
> Pretrained with 32k context? seems very high
> No yarn nor llama3 like rope extension

3) Distillation
> Only keep te first 256 logits for the teacher
> Ablation on the teacher gap (tl;dr you need some "patience" to see that using a small teacher is better)
> On policy distillation yeahh (by
@agarwl_
et al), not sure if the teacher gap behave the same here, curious if someone have more info?

4) Others
> Checkpoint with QAT, that's very cool
> RL using improve version of BOND, WARM/WARP good excuse to look at
@ramealexandre
papers
> Only use Zero3, no TP/PP if i understand correctly ?
> Training budget relatively similar than gemma2

1 reply

upvoted an article 3 days ago

Article

Open R1: Update #3

and 9 others •

4 days ago

• 217

published a dataset 4 days ago

Steveeeeeeen/mls_eng_10k

Updated about 22 hours ago • 47

updated a dataset 5 days ago

Steveeeeeeen/expresso

Updated 5 days ago • 39

published a dataset 5 days ago

Steveeeeeeen/expresso

Updated 5 days ago • 39

upvoted an article 5 days ago

Article

Introducing EuroBERT: A High-Performance Multilingual Encoder Model

and 3 others •

5 days ago

• 121

upvoted an article 8 days ago

Article

LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone!

8 days ago

• 32

liked a model 10 days ago

ysdede/Phi-4-mm-inst-asr-turkish

Text Generation • Updated 12 days ago • 167 • 4

New activity in hf-audio/open_asr_leaderboard 11 days ago

Adding Phi-4-Multimodal

#32 opened 11 days ago by

Steveeeeeeen

updated a Space 11 days ago

671

Open ASR Leaderboard

🏆

Request evaluation of a speech recognition model

commented on From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages 11 days ago

Hey!
You're on the right track! The output you're seeing is still tokenized with special markers. To convert this into natural language text, you need to use your tokenizer's decode function.
You can find an example here: https://huggingface.co/HKUSTAudio/Llasa-1B

liked a Space 12 days ago

Find a leaderboard

🔍

Explore and discover all leaderboards from the HF community

New activity in microsoft/Phi-4-multimodal-instruct 12 days ago

Make Phi4MMForCausalLM.forward's num_logits_to_keep actually optional

#20 opened 12 days ago by

phh