Speech Recognition Community Event Version 2

non-profit

Activity Feed

AI & ML interests

Multi-Lingual Speech Recognition

Recent Activity

w11wo authored a paper about 15 hours ago

COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition

vumichien authored a paper about 1 month ago

Bridging the Data Provenance Gap Across Text, Speech and Video

morenolq authored a paper about 1 month ago

FlanEC: Exploring Flan-T5 for Post-ASR Error Correction

View all activity

speech-recognition-community-v2's activity

nguyenvulebinh

authored a paper 16 days ago

MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models

Paper • 2411.18152 • Published Nov 27, 2024

g8a9

authored a paper about 2 months ago

MSTS: A Multimodal Safety Test Suite for Vision-Language Models

Paper • 2501.10057 • Published Jan 17 • 8

FremyCompany

posted an update 2 months ago

Post

612

🔀 Very cool demo of word-level alignment of paraphrased or cross-lingual sentences, from the new Fairly Multilingual ModernBERT embedding model:

Parallia/Fairly-Multilingual-ModernBERT-Token-Alignment

gagan3012

authored a paper 3 months ago

DateLogicQA: Benchmarking Temporal Biases in Large Language Models

Paper • 2412.13377 • Published Dec 17, 2024 • 2

gagan3012

authored a paper 4 months ago

Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks

Paper • 2411.01192 • Published Nov 2, 2024 • 3

patrickvonplaten

authored a paper 5 months ago

Pixtral 12B

Paper • 2410.07073 • Published Oct 9, 2024 • 64

kingabzpro

posted an update 6 months ago

Post

1237

I believe HF中国镜像站 should have something similar to Hacktoberfest. I miss the days when there were events like this every 3 months for audio, deep reinforcement learning, gradio themes, but it turns out everything slowed down. There are no more HF中国镜像站 events.
@victor

3 replies

kingabzpro

posted an update 6 months ago

Post

1433

I never imagined that Jenkins could be as powerful and easy to implement as GitHub Actions. Loving it. 🥰

kingabzpro

posted an update 6 months ago

Post

1841

How can I make my RAG application generate real-time responses? Up until now, I have been using Groq for fast LLM generation and the Gradio Live function. I am looking for a better solution that can help me build a real-time application without any delay. @abidlabs

kingabzpro/Real-Time-RAG

2 replies

nguyenvulebinh

authored a paper 7 months ago

Convoifilter: A case study of doing cocktail party speech recognition

Paper • 2308.11380 • Published Aug 22, 2023 • 1

gagan3012

authored 2 papers 8 months ago

Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic

Paper • 2407.18129 • Published Jul 25, 2024 • 12

Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition

Paper • 2407.13559 • Published Jul 18, 2024 • 17

mrm8488

posted an update 9 months ago

Post

5485

🚨Exciting news for the Multilingual Synthetic Data Community!🚨

I’ve taken inspiration from the MAGPIE paper on Llama-3-8B-instruct and extended its capabilities. Here’s what’s new!

🗞 The MAGPIE paper showcased that if you use the instruction-tuned version (Llama-3-8B-instruct) to generate synthetic instructions and then fine-tune the base version (Llama-3-8B) on this dataset, you can improve even the it-tuned version

🤔 While reading a script by Sebastian Raschka, PhD, I wondered: Could these advancements be replicated in other languages? Specifically, could they benefit non-English datasets?

🎉 And the answer is YES! At least for Spanish. I've successfully adapted the techniques for Spanish, proving the model's flexibility and multilingual capabilities.

👩‍💻 To make this accessible, I created a basic script (heavily inspired by the Sebastian Raschka one) that allows you to generate similar datasets using ollama models (initially phi and llama3) automatically and upload it to the HF中国镜像站 Hub!
[Script](https://gist.github.com/mrm8488/4650a5e3cc45523798a527a3446eb312)

🔍 Explore the datasets 📚 generated using our new script!

- [Llama-3-8B](https://huggingface.co/datasets/mrm8488/dataset_llama3_5000_samples_es_4231_filtered)
- [Phi-3-medium](https://huggingface.co/datasets/mrm8488/dataset_phi3-medium_5000_samples_es_3906_filtered)
- [Phi-3-mini](https://huggingface.co/datasets/mrm8488/dataset_phi3_5000_samples_es_3282_filtered)

Note: These datasets have basic filtering. Apply additional quality filters before using them to fine-tune large language models.

Inspiration and base script:
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/05_dataset-generation/llama3-ollama.ipynb
https://www.linkedin.com/feed/update/urn:li:activity:7210982019751661568/

7 replies

mrm8488

posted an update 10 months ago

Post

6347

Working on a concept GPT-2 (small) that uses KANs instead of MLPs.
The ckpt and training code will be soon on the hub.

6 replies

FremyCompany

posted an update 11 months ago

Post

2363

Today, April 26, is the Day of the Tatar Language! 🌟
To celebrate, we release our new language model, Tweety Tatar 🐣

https://huggingface.co/Tweeties/tweety-tatar-base-7b-2024-v1

The model was converted from Mistral Instruct v0.2 using a novel technique called trans-tokenization. As a result, the model uses a brand-new tokenizer, fully tailored for the Tatar language.

We also release a model which can be finetuned for translation of English or Russian into Tatar, and achieves a performance similar to commercial offerings:

https://huggingface.co/Tweeties/tweety-tatar-hydra-base-7b-2024-v1

More details in our upcoming paper 👀
François REMY, Pieter Delobelle, Alfiya Khabibullina

Татар теле көне белән!