Speech Recognition Community Event Version 2

non-profit
Activity Feed

AI & ML interests

Multi-Lingual Speech Recognition

Recent Activity

speech-recognition-community-v2's activity

FremyCompany 
posted an update 2 months ago
kingabzpro 
posted an update 6 months ago
view post
Post
1237
I believe HF中国镜像站 should have something similar to Hacktoberfest. I miss the days when there were events like this every 3 months for audio, deep reinforcement learning, gradio themes, but it turns out everything slowed down. There are no more HF中国镜像站 events.
@victor
  • 3 replies
·
kingabzpro 
posted an update 6 months ago
view post
Post
1433
I never imagined that Jenkins could be as powerful and easy to implement as GitHub Actions. Loving it. 🥰
kingabzpro 
posted an update 6 months ago
view post
Post
1841
How can I make my RAG application generate real-time responses? Up until now, I have been using Groq for fast LLM generation and the Gradio Live function. I am looking for a better solution that can help me build a real-time application without any delay. @abidlabs

kingabzpro/Real-Time-RAG
  • 2 replies
·
mrm8488 
posted an update 9 months ago
view post
Post
5485
🚨Exciting news for the Multilingual Synthetic Data Community!🚨

I’ve taken inspiration from the MAGPIE paper on Llama-3-8B-instruct and extended its capabilities. Here’s what’s new!

🗞 The MAGPIE paper showcased that if you use the instruction-tuned version (Llama-3-8B-instruct) to generate synthetic instructions and then fine-tune the base version (Llama-3-8B) on this dataset, you can improve even the it-tuned version

🤔 While reading a script by Sebastian Raschka, PhD, I wondered: Could these advancements be replicated in other languages? Specifically, could they benefit non-English datasets?

🎉 And the answer is YES! At least for Spanish. I've successfully adapted the techniques for Spanish, proving the model's flexibility and multilingual capabilities.

👩‍💻 To make this accessible, I created a basic script (heavily inspired by the Sebastian Raschka one) that allows you to generate similar datasets using ollama models (initially phi and llama3) automatically and upload it to the HF中国镜像站 Hub!
[Script](https://gist.github.com/mrm8488/4650a5e3cc45523798a527a3446eb312)


🔍 Explore the datasets 📚 generated using our new script!

- [Llama-3-8B](https://huggingface.co/datasets/mrm8488/dataset_llama3_5000_samples_es_4231_filtered)
- [Phi-3-medium](https://huggingface.co/datasets/mrm8488/dataset_phi3-medium_5000_samples_es_3906_filtered)
- [Phi-3-mini](https://huggingface.co/datasets/mrm8488/dataset_phi3_5000_samples_es_3282_filtered)


Note: These datasets have basic filtering. Apply additional quality filters before using them to fine-tune large language models.

Inspiration and base script:
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/05_dataset-generation/llama3-ollama.ipynb
https://www.linkedin.com/feed/update/urn:li:activity:7210982019751661568/
·
mrm8488 
posted an update 10 months ago
view post
Post
6347
Working on a concept GPT-2 (small) that uses KANs instead of MLPs.
The ckpt and training code will be soon on the hub.
·
FremyCompany 
posted an update 11 months ago
view post
Post
2363
Today, April 26, is the Day of the Tatar Language! 🌟
To celebrate, we release our new language model, Tweety Tatar 🐣

https://huggingface.co/Tweeties/tweety-tatar-base-7b-2024-v1

The model was converted from Mistral Instruct v0.2 using a novel technique called trans-tokenization. As a result, the model uses a brand-new tokenizer, fully tailored for the Tatar language.

We also release a model which can be finetuned for translation of English or Russian into Tatar, and achieves a performance similar to commercial offerings:

https://huggingface.co/Tweeties/tweety-tatar-hydra-base-7b-2024-v1

More details in our upcoming paper 👀
François REMY, Pieter Delobelle, Alfiya Khabibullina

Татар теле көне белән!
·
mrm8488 
posted an update about 1 year ago
view post
Post
Hello world! 🔥