John Smith PRO
John6666
AI & ML interests
None yet
Recent Activity
liked
a model
12 minutes ago
xavelche/DeepSeek-R1-Agri-QA
liked
a model
12 minutes ago
mradermacher/DeepSeek-R1-Agri-QA-GGUF
liked
a model
16 minutes ago
rana-shahroz/gemma-2-9b-openorca20k
Organizations
John6666's activity

reacted to
clem's
post with 🤗
about 17 hours ago

reacted to
csabakecskemeti's
post with 👍
about 17 hours ago
Post
325
Cohere Command-a Q2 quant
DevQuasar/CohereForAI.c4ai-command-a-03-2025-GGUF
6.7t/s on a 3gpu setup (4080 + 2x3090)
(q3, q4 currently uploading)
DevQuasar/CohereForAI.c4ai-command-a-03-2025-GGUF
6.7t/s on a 3gpu setup (4080 + 2x3090)
(q3, q4 currently uploading)

reacted to
AdinaY's
post with 🔥
about 17 hours ago
Post
626
SEA-VL🔥 an OPEN dataset bridge the AI culture gap in Southeast Asia!
Paper: Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia (2503.07920)
✨1.28M+ culturally relevant images
✨85% accuracy in auto-collected images
✨Tracking underrepresented SEA languages & traditions
Paper: Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia (2503.07920)
✨1.28M+ culturally relevant images
✨85% accuracy in auto-collected images
✨Tracking underrepresented SEA languages & traditions

reacted to
burtenshaw's
post with 👍
about 17 hours ago
Post
608
Still speed running Gemma 3 to think. Today I focused on setting up gpu poor hardware to run GRPO.
This is a plain TRL and PEFT notebook which works on mac silicone or colab T4. This uses the 1b variant of Gemma 3 and a reasoning version of GSM8K dataset.
🧑🍳 There’s more still in the oven like releasing models, an Unsloth version, and deeper tutorials, but hopefully this should bootstrap your projects.
Here’s a link to the 1b notebook: https://colab.research.google.com/drive/1mwCy5GQb9xJFSuwt2L_We3eKkVbx2qSt?usp=sharing
This is a plain TRL and PEFT notebook which works on mac silicone or colab T4. This uses the 1b variant of Gemma 3 and a reasoning version of GSM8K dataset.
🧑🍳 There’s more still in the oven like releasing models, an Unsloth version, and deeper tutorials, but hopefully this should bootstrap your projects.
Here’s a link to the 1b notebook: https://colab.research.google.com/drive/1mwCy5GQb9xJFSuwt2L_We3eKkVbx2qSt?usp=sharing

reacted to
nicolay-r's
post with 🔥
about 17 hours ago
Post
614
📢 With the recent release of Gemma-3, If you interested to play with textual chain-of-though, the notebook below is a wrapper over the the model (native transformers inference API) for passing the predefined schema of promps in batching mode.
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_gemma_3.ipynb
Limitation: schema supports texts only (for now), while gemma-3 is a text+image to text.
Model: google/gemma-3-1b-it
Provider: https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_gemma3.py
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_gemma_3.ipynb
Limitation: schema supports texts only (for now), while gemma-3 is a text+image to text.
Model: google/gemma-3-1b-it
Provider: https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_gemma3.py

reacted to
fdaudens's
post with 🔥
about 23 hours ago
Post
755
Ever wanted 45 min with one of AI’s most fascinating minds? Was with
@thomwolf
at HumanX Vegas. Sharing my notes of his Q&A with the press—completely changed how I think about AI’s future:
1️⃣ The next wave of successful AI companies won’t be defined by who has the best model but by who builds the most useful real-world solutions. "We all have engines in our cars, but that’s rarely the only reason we buy one. We expect it to work well, and that’s enough. LLMs will be the same."
2️⃣ Big players are pivoting: "Closed-source companies—OpenAI being the first—have largely shifted from LLM announcements to product announcements."
3️⃣ Open source is changing everything: "DeepSeek was open source AI’s ChatGPT moment. Basically, everyone outside the bubble realized you can get a model for free—and it’s just as good as the paid ones."
4️⃣ Product innovation is being democratized: Take Manus, for example—they built a product on top of Anthropic’s models that’s "actually better than Anthropic’s own product for now, in terms of agents." This proves that anyone can build great products with existing models.
We’re entering a "multi-LLM world," where models are becoming commoditized, and all the tools to build are readily available—just look at the flurry of daily new releases on HF中国镜像站.
Thom's comparison to the internet era is spot-on: "In the beginning you made a lot of money by making websites... but nowadays the huge internet companies are not the companies that built websites. Like Airbnb, Uber, Facebook, they just use the internet as a medium to make something for real life use cases."
Love to hear your thoughts on this shift!
1️⃣ The next wave of successful AI companies won’t be defined by who has the best model but by who builds the most useful real-world solutions. "We all have engines in our cars, but that’s rarely the only reason we buy one. We expect it to work well, and that’s enough. LLMs will be the same."
2️⃣ Big players are pivoting: "Closed-source companies—OpenAI being the first—have largely shifted from LLM announcements to product announcements."
3️⃣ Open source is changing everything: "DeepSeek was open source AI’s ChatGPT moment. Basically, everyone outside the bubble realized you can get a model for free—and it’s just as good as the paid ones."
4️⃣ Product innovation is being democratized: Take Manus, for example—they built a product on top of Anthropic’s models that’s "actually better than Anthropic’s own product for now, in terms of agents." This proves that anyone can build great products with existing models.
We’re entering a "multi-LLM world," where models are becoming commoditized, and all the tools to build are readily available—just look at the flurry of daily new releases on HF中国镜像站.
Thom's comparison to the internet era is spot-on: "In the beginning you made a lot of money by making websites... but nowadays the huge internet companies are not the companies that built websites. Like Airbnb, Uber, Facebook, they just use the internet as a medium to make something for real life use cases."
Love to hear your thoughts on this shift!

reacted to
prithivMLmods's
post with 🔥
about 23 hours ago
Post
1935
Gemma-3-4B : Image and Video Inference 🖼️🎥
🧤Space: prithivMLmods/Gemma-3-Multimodal
@gemma3 : {Tag + Space_+ 'prompt'}
@video-infer : {Tag + Space_+ 'prompt'}
+ Gemma3-4B : google/gemma-3-4b-it
+ By default, it runs : prithivMLmods/Qwen2-VL-OCR-2B-Instruct
Gemma 3 Technical Report : https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
🧤Space: prithivMLmods/Gemma-3-Multimodal
@gemma3 : {Tag + Space_+ 'prompt'}
@video-infer : {Tag + Space_+ 'prompt'}
+ Gemma3-4B : google/gemma-3-4b-it
+ By default, it runs : prithivMLmods/Qwen2-VL-OCR-2B-Instruct
Gemma 3 Technical Report : https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf

reacted to
AtAndDev's
post with 🔥
about 23 hours ago
Post
1122
Gemma 3 seems to be really good at human preference. Just waiting for ppl to see it.

reacted to
eaddario's
post with 👍
about 23 hours ago
Post
286
Experimental
eaddario/DeepSeek-R1-Distill-Llama-8B-GGUF is now available.
Got close to achieve a 10% reduction in size but the quality started to deteriorate so this version has been pruned more conservatively. Sizes, on average, are about 8% smaller with only a very small penalty (< 1%) in quality.
After trying with different models and parameter count, I suspect the best I'll be able to do with the current process is between 6 and 8% reduction so have decided to try a different approach. I'll publish process and findings next.
For background: https://huggingface.co/posts/eaddario/832567461491467
Got close to achieve a 10% reduction in size but the quality started to deteriorate so this version has been pruned more conservatively. Sizes, on average, are about 8% smaller with only a very small penalty (< 1%) in quality.
After trying with different models and parameter count, I suspect the best I'll be able to do with the current process is between 6 and 8% reduction so have decided to try a different approach. I'll publish process and findings next.
For background: https://huggingface.co/posts/eaddario/832567461491467

reacted to
jsulz's
post with 🚀
about 23 hours ago
Post
838
It's finally here ❤️
Build faster than ever with lightning fast upload and download speeds starting today on the Hub ⚡
Xet storage is rolling out access across the Hub - join the waitlist here https://huggingface.co/join/xet
You can apply for yourself, or your entire organization. Head over to your account settings for more information or join anywhere you see the Xet logo on a repository you know.
Have questions? Join the conversation below 👇 or open a discussion on the Xet team page xet-team/README
Build faster than ever with lightning fast upload and download speeds starting today on the Hub ⚡
Xet storage is rolling out access across the Hub - join the waitlist here https://huggingface.co/join/xet
You can apply for yourself, or your entire organization. Head over to your account settings for more information or join anywhere you see the Xet logo on a repository you know.
Have questions? Join the conversation below 👇 or open a discussion on the Xet team page xet-team/README

reacted to
Mertpy's
post with 👍
about 23 hours ago
Post
1017
Dynamic Intuition-Based Reasoning (DIBR)
https://huggingface.co/blog/Veyllo/dynamic-intuition-based-reasoning
Do you guys think this approach has potential?
The idea is to combine rapid, non-analytical pattern recognition (intuition) with traditional analytical reasoning to help AI systems handle "untrained" problems more effectively. It’s still a theoretical framework.
https://huggingface.co/blog/Veyllo/dynamic-intuition-based-reasoning
Do you guys think this approach has potential?
The idea is to combine rapid, non-analytical pattern recognition (intuition) with traditional analytical reasoning to help AI systems handle "untrained" problems more effectively. It’s still a theoretical framework.

reacted to
MonsterMMORPG's
post with 🚀
about 23 hours ago
Post
1262
I just pushed another amazing update to our Wan 2.1 APP. LoRA loading for 14B Wan 2.1 models were taking over 15 minutes. Optimized to take only few seconds now. Fully supports RTX 5000 series and fully optimized for both VRAM and RAM.
Our APP here : https://www.patreon.com/posts/wan-2-1-ultra-as-123105403
Tutorial 1 : https://youtu.be/hnAhveNy-8s
Tutorial 2 : https://youtu.be/ueMrzmbdWBg
It is also pushed to the original repo you can see pull request here : https://github.com/modelscope/DiffSynth-Studio/pull/442
Our APP here : https://www.patreon.com/posts/wan-2-1-ultra-as-123105403
Tutorial 1 : https://youtu.be/hnAhveNy-8s
Tutorial 2 : https://youtu.be/ueMrzmbdWBg
It is also pushed to the original repo you can see pull request here : https://github.com/modelscope/DiffSynth-Studio/pull/442

reacted to
ginipick's
post with 🔥
about 23 hours ago
Post
2255
🌐 GraphMind: Phi-3 Instruct Graph Explorer
✨ Extract and visualize knowledge graphs from any text in multiple languages!
GraphMind is a powerful tool that leverages the capabilities of Phi-3 to transform unstructured text into structured knowledge graphs, helping you understand complex relationships within any content.
ginigen/Graph-Mind
🚀 Key Features
Multi-language Support 🌍: Process text in English, Korean, and many other languages
Instant Visualization 🧩: See extracted entities and relationships in an interactive graph
Entity Recognition 🏷️: Automatically identifies and categorizes named entities
Optimized Performance ⚡: Uses caching to deliver faster results for common examples
Intuitive Interface 👆: Simple design makes complex graph extraction accessible to everyone
💡 Use Cases
Content Analysis: Extract key entities and relationships from articles or documents
Research Assistance: Quickly visualize connections between concepts in research papers
Educational Tool: Help students understand the structure of complex texts
Multilingual Processing: Extract knowledge from content in various languages
🔧 How It Works
Enter any text in the input field
Select a model from the dropdown
Click "Extract & Visualize"
Explore the interactive knowledge graph and entity recognition results
GraphMind bridges the gap between raw text and structured knowledge, making it easier to identify patterns, extract insights, and understand relationships within any content. Try it now and transform how you interact with textual information!
#NLP #KnowledgeGraph #TextAnalysis #Visualization #Phi3 #MultilingualAI
✨ Extract and visualize knowledge graphs from any text in multiple languages!
GraphMind is a powerful tool that leverages the capabilities of Phi-3 to transform unstructured text into structured knowledge graphs, helping you understand complex relationships within any content.
ginigen/Graph-Mind
🚀 Key Features
Multi-language Support 🌍: Process text in English, Korean, and many other languages
Instant Visualization 🧩: See extracted entities and relationships in an interactive graph
Entity Recognition 🏷️: Automatically identifies and categorizes named entities
Optimized Performance ⚡: Uses caching to deliver faster results for common examples
Intuitive Interface 👆: Simple design makes complex graph extraction accessible to everyone
💡 Use Cases
Content Analysis: Extract key entities and relationships from articles or documents
Research Assistance: Quickly visualize connections between concepts in research papers
Educational Tool: Help students understand the structure of complex texts
Multilingual Processing: Extract knowledge from content in various languages
🔧 How It Works
Enter any text in the input field
Select a model from the dropdown
Click "Extract & Visualize"
Explore the interactive knowledge graph and entity recognition results
GraphMind bridges the gap between raw text and structured knowledge, making it easier to identify patterns, extract insights, and understand relationships within any content. Try it now and transform how you interact with textual information!
#NLP #KnowledgeGraph #TextAnalysis #Visualization #Phi3 #MultilingualAI

reacted to
AdinaY's
post with 🤗
about 23 hours ago
Post
876
Open Sora 2.0 is out 🔥
hpcai-tech/open-sora-20-67cfb7efa80a73999ccfc2d5
✨ 11B with Apache2.0
✨ Low training cost - $200k
✨ open weights, code and training workflow
hpcai-tech/open-sora-20-67cfb7efa80a73999ccfc2d5
✨ 11B with Apache2.0
✨ Low training cost - $200k
✨ open weights, code and training workflow

reacted to
Jaward's
post with 👀
about 23 hours ago
Post
786
This is the most exciting of this week’s release for me: Gemini Robotics - A SOTA generalist Vision-Language-Action model that brings intelligence to the physical world. It comes with a verifiable real-world knowledge Embodied Reasoning QA benchmark. Cool part is that the model can be specialized with fast adaptation to new tasks and have such adaptations transferred to new robot embodiment like humanoids. Looking forward to the model and data on hf, it’s about time I go full physical:)
Technical Report: https://storage.googleapis.com/deepmind-media/gemini-robotics/gemini_robotics_report.pdf
Technical Report: https://storage.googleapis.com/deepmind-media/gemini-robotics/gemini_robotics_report.pdf

reacted to
burtenshaw's
post with 🤗
about 23 hours ago
Post
942
everybody and their dog is fine-tuning Gemma 3 today, so I thought I'd do a longer post on the tips and sharp edges I find. let's go!
1. has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running
plus this with
2. will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb
3. with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps.
4. so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters.
5. vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth
if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way.
https://huggingface.co/reasoning-course
1. has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running
git+https://github.com/huggingface/transformers@main
git+https://github.com/huggingface/trl.git@main
bitsandbytes
peft
plus this with
--no-deps
git+https://github.com/unslothai/unsloth-zoo.git@nightly
git+https://github.com/unslothai/unsloth.git@nightly
2. will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb
3. with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps.
4. so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters.
from trl import GRPOConfig
training_args = GRPOConfig(
learning_rate = 5e-6,
adam_beta1 = 0.9,
adam_beta2 = 0.99,
weight_decay = 0.1,
warmup_ratio = 0.1,
lr_scheduler_type = "cosine",
optim = "adamw_8bit",
logging_steps = 1,
per_device_train_batch_size = 2,
gradient_accumulation_steps = 1,
num_generations = 2,
max_prompt_length = 256,
max_completion_length = 1024 - 256,
num_train_epochs = 1,
max_steps = 250,
save_steps = 250,
max_grad_norm = 0.1,
report_to = "none",
)
5. vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth
from transformers import AutoModelForImageTextToText
model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it)
if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way.
https://huggingface.co/reasoning-course

reacted to
jasoncorkill's
post with 🔥
2 days ago
Post
2089
Benchmarking Google's Veo2: How Does It Compare?
The results did not meet expectations. Veo2 struggled with style consistency and temporal coherence, falling behind competitors like Runway, Pika, Tencent, and even Alibaba. While the model shows promise, its alignment and quality are not yet there.
Google recently launched Veo2, its latest text-to-video model, through select partners like fal.ai. As part of our ongoing evaluation of state-of-the-art generative video models, we rigorously benchmarked Veo2 against industry leaders.
We generated a large set of Veo2 videos spending hundreds of dollars in the process and systematically evaluated them using our Python-based API for human and automated labeling.
Check out the ranking here: https://www.rapidata.ai/leaderboard/video-models
Rapidata/text-2-video-human-preferences-veo2
The results did not meet expectations. Veo2 struggled with style consistency and temporal coherence, falling behind competitors like Runway, Pika, Tencent, and even Alibaba. While the model shows promise, its alignment and quality are not yet there.
Google recently launched Veo2, its latest text-to-video model, through select partners like fal.ai. As part of our ongoing evaluation of state-of-the-art generative video models, we rigorously benchmarked Veo2 against industry leaders.
We generated a large set of Veo2 videos spending hundreds of dollars in the process and systematically evaluated them using our Python-based API for human and automated labeling.
Check out the ranking here: https://www.rapidata.ai/leaderboard/video-models
Rapidata/text-2-video-human-preferences-veo2