John6666 (John Smith)

reacted to clem's post with 🤗 about 17 hours ago

Post

829

We just crossed 1,500,000 public models on HF中国镜像站 (and 500k spaces, 330k datasets, 50k papers). One new repository is created every 15 seconds. Congratulations all!

1 reply

·

reacted to csabakecskemeti's post with 👍 about 17 hours ago

Post

325

Cohere Command-a Q2 quant
DevQuasar/CohereForAI.c4ai-command-a-03-2025-GGUF

6.7t/s on a 3gpu setup (4080 + 2x3090)

(q3, q4 currently uploading)

reacted to onekq's post with 🚀🔥 about 17 hours ago

Post

564

Qwen made good students, DeepSeek made a genius.

This is my summaries of their differentiations. I don't think these two players are coordinated but they both have clear goals. One is to build ecosystem and the other is to push AGI.

And IMO they are both doing really well.

1 reply

·

reacted to AdinaY's post with 🔥 about 17 hours ago

Post

626

SEA-VL🔥 an OPEN dataset bridge the AI culture gap in Southeast Asia!

Paper: Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia (2503.07920)

✨1.28M+ culturally relevant images
✨85% accuracy in auto-collected images
✨Tracking underrepresented SEA languages & traditions

reacted to burtenshaw's post with 👍 about 17 hours ago

Post

608

Still speed running Gemma 3 to think. Today I focused on setting up gpu poor hardware to run GRPO.

This is a plain TRL and PEFT notebook which works on mac silicone or colab T4. This uses the 1b variant of Gemma 3 and a reasoning version of GSM8K dataset.

🧑‍🍳 There’s more still in the oven like releasing models, an Unsloth version, and deeper tutorials, but hopefully this should bootstrap your projects.

Here’s a link to the 1b notebook: https://colab.research.google.com/drive/1mwCy5GQb9xJFSuwt2L_We3eKkVbx2qSt?usp=sharing

reacted to nicolay-r's post with 🔥 about 17 hours ago

Post

614

📢 With the recent release of Gemma-3, If you interested to play with textual chain-of-though, the notebook below is a wrapper over the the model (native transformers inference API) for passing the predefined schema of promps in batching mode.
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_gemma_3.ipynb

Limitation: schema supports texts only (for now), while gemma-3 is a text+image to text.

Model: google/gemma-3-1b-it
Provider: https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_gemma3.py

reacted to fdaudens's post with 🔥 about 23 hours ago

Post

755

Ever wanted 45 min with one of AI’s most fascinating minds? Was with @thomwolf at HumanX Vegas. Sharing my notes of his Q&A with the press—completely changed how I think about AI’s future:

1️⃣ The next wave of successful AI companies won’t be defined by who has the best model but by who builds the most useful real-world solutions. "We all have engines in our cars, but that’s rarely the only reason we buy one. We expect it to work well, and that’s enough. LLMs will be the same."

2️⃣ Big players are pivoting: "Closed-source companies—OpenAI being the first—have largely shifted from LLM announcements to product announcements."

3️⃣ Open source is changing everything: "DeepSeek was open source AI’s ChatGPT moment. Basically, everyone outside the bubble realized you can get a model for free—and it’s just as good as the paid ones."

4️⃣ Product innovation is being democratized: Take Manus, for example—they built a product on top of Anthropic’s models that’s "actually better than Anthropic’s own product for now, in terms of agents." This proves that anyone can build great products with existing models.

We’re entering a "multi-LLM world," where models are becoming commoditized, and all the tools to build are readily available—just look at the flurry of daily new releases on HF中国镜像站.

Thom's comparison to the internet era is spot-on: "In the beginning you made a lot of money by making websites... but nowadays the huge internet companies are not the companies that built websites. Like Airbnb, Uber, Facebook, they just use the internet as a medium to make something for real life use cases."

Love to hear your thoughts on this shift!

1 reply

·

reacted to prithivMLmods's post with 🔥 about 23 hours ago

Post

1935

Gemma-3-4B : Image and Video Inference 🖼️🎥

🧤Space: prithivMLmods/Gemma-3-Multimodal

@gemma3 : {Tag + Space_+ 'prompt'}
@video-infer : {Tag + Space_+ 'prompt'}

+ Gemma3-4B : google/gemma-3-4b-it
+ By default, it runs : prithivMLmods/Qwen2-VL-OCR-2B-Instruct

Gemma 3 Technical Report : https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf

1 reply

·

reacted to AtAndDev's post with 🔥 about 23 hours ago

Post

1122

Gemma 3 seems to be really good at human preference. Just waiting for ppl to see it.

reacted to eaddario's post with 👍 about 23 hours ago

Post

286

Experimental eaddario/DeepSeek-R1-Distill-Llama-8B-GGUF is now available.

Got close to achieve a 10% reduction in size but the quality started to deteriorate so this version has been pruned more conservatively. Sizes, on average, are about 8% smaller with only a very small penalty (< 1%) in quality.

After trying with different models and parameter count, I suspect the best I'll be able to do with the current process is between 6 and 8% reduction so have decided to try a different approach. I'll publish process and findings next.

For background: https://huggingface.co/posts/eaddario/832567461491467

reacted to jsulz's post with 🚀 about 23 hours ago

Post

838

It's finally here ❤️

Build faster than ever with lightning fast upload and download speeds starting today on the Hub ⚡

Xet storage is rolling out access across the Hub - join the waitlist here https://huggingface.co/join/xet

You can apply for yourself, or your entire organization. Head over to your account settings for more information or join anywhere you see the Xet logo on a repository you know.

Have questions? Join the conversation below 👇 or open a discussion on the Xet team page xet-team/README

4 replies

·

reacted to Mertpy's post with 👍 about 23 hours ago

Post

1017

Dynamic Intuition-Based Reasoning (DIBR)
https://huggingface.co/blog/Veyllo/dynamic-intuition-based-reasoning

Do you guys think this approach has potential?
The idea is to combine rapid, non-analytical pattern recognition (intuition) with traditional analytical reasoning to help AI systems handle "untrained" problems more effectively. It’s still a theoretical framework.

4 replies

·

reacted to MonsterMMORPG's post with 🚀 about 23 hours ago

Post

1262

I just pushed another amazing update to our Wan 2.1 APP. LoRA loading for 14B Wan 2.1 models were taking over 15 minutes. Optimized to take only few seconds now. Fully supports RTX 5000 series and fully optimized for both VRAM and RAM.

Our APP here : https://www.patreon.com/posts/wan-2-1-ultra-as-123105403

Tutorial 1 : https://youtu.be/hnAhveNy-8s

Tutorial 2 : https://youtu.be/ueMrzmbdWBg

It is also pushed to the original repo you can see pull request here : https://github.com/modelscope/DiffSynth-Studio/pull/442

reacted to etemiz's post with 👍 about 23 hours ago

Post

783

Benchmarked Gemma 3 today. It has better knowledge compared to 2 but still in the median area in the leaderboard.

1 reply

·

reacted to ginipick's post with 🔥 about 23 hours ago

Post

2255

🌐 GraphMind: Phi-3 Instruct Graph Explorer

✨ Extract and visualize knowledge graphs from any text in multiple languages!

GraphMind is a powerful tool that leverages the capabilities of Phi-3 to transform unstructured text into structured knowledge graphs, helping you understand complex relationships within any content.

ginigen/Graph-Mind

🚀 Key Features

Multi-language Support 🌍: Process text in English, Korean, and many other languages
Instant Visualization 🧩: See extracted entities and relationships in an interactive graph
Entity Recognition 🏷️: Automatically identifies and categorizes named entities
Optimized Performance ⚡: Uses caching to deliver faster results for common examples
Intuitive Interface 👆: Simple design makes complex graph extraction accessible to everyone

💡 Use Cases

Content Analysis: Extract key entities and relationships from articles or documents
Research Assistance: Quickly visualize connections between concepts in research papers
Educational Tool: Help students understand the structure of complex texts
Multilingual Processing: Extract knowledge from content in various languages

🔧 How It Works

Enter any text in the input field
Select a model from the dropdown
Click "Extract & Visualize"
Explore the interactive knowledge graph and entity recognition results

GraphMind bridges the gap between raw text and structured knowledge, making it easier to identify patterns, extract insights, and understand relationships within any content. Try it now and transform how you interact with textual information!
#NLP #KnowledgeGraph #TextAnalysis #Visualization #Phi3 #MultilingualAI

1 reply

·

reacted to AdinaY's post with 🤗 about 23 hours ago

Post

876

Open Sora 2.0 is out 🔥
hpcai-tech/open-sora-20-67cfb7efa80a73999ccfc2d5
✨ 11B with Apache2.0
✨ Low training cost - $200k
✨ open weights, code and training workflow

reacted to Jaward's post with 👀 about 23 hours ago

Post

786

This is the most exciting of this week’s release for me: Gemini Robotics - A SOTA generalist Vision-Language-Action model that brings intelligence to the physical world. It comes with a verifiable real-world knowledge Embodied Reasoning QA benchmark. Cool part is that the model can be specialized with fast adaptation to new tasks and have such adaptations transferred to new robot embodiment like humanoids. Looking forward to the model and data on hf, it’s about time I go full physical:)
Technical Report: https://storage.googleapis.com/deepmind-media/gemini-robotics/gemini_robotics_report.pdf

reacted to burtenshaw's post with 🤗 about 23 hours ago

Post

942

everybody and their dog is fine-tuning Gemma 3 today, so I thought I'd do a longer post on the tips and sharp edges I find. let's go!

1. has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running

git+https://github.com/huggingface/transformers@main
git+https://github.com/huggingface/trl.git@main
bitsandbytes
peft

plus this with --no-deps

git+https://github.com/unslothai/unsloth-zoo.git@nightly
git+https://github.com/unslothai/unsloth.git@nightly

2. will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb

3. with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps.

4. so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters.

from trl import GRPOConfig

training_args = GRPOConfig(
    learning_rate = 5e-6,
    adam_beta1 = 0.9,
    adam_beta2 = 0.99,
    weight_decay = 0.1,
    warmup_ratio = 0.1,
    lr_scheduler_type = "cosine",
    optim = "adamw_8bit",
    logging_steps = 1,
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 1,
    num_generations = 2,
    max_prompt_length = 256,
    max_completion_length = 1024 - 256,
    num_train_epochs = 1,
    max_steps = 250,
    save_steps = 250,
    max_grad_norm = 0.1,
    report_to = "none",
)

5. vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth

from transformers import AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it)

if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way.

https://huggingface.co/reasoning-course

2 replies

·

reacted to jasoncorkill's post with 🔥 2 days ago

Post

2089

Benchmarking Google's Veo2: How Does It Compare?

The results did not meet expectations. Veo2 struggled with style consistency and temporal coherence, falling behind competitors like Runway, Pika, Tencent, and even Alibaba. While the model shows promise, its alignment and quality are not yet there.

Google recently launched Veo2, its latest text-to-video model, through select partners like fal.ai. As part of our ongoing evaluation of state-of-the-art generative video models, we rigorously benchmarked Veo2 against industry leaders.

We generated a large set of Veo2 videos spending hundreds of dollars in the process and systematically evaluated them using our Python-based API for human and automated labeling.

Check out the ranking here: https://www.rapidata.ai/leaderboard/video-models

Rapidata/text-2-video-human-preferences-veo2

John Smith PRO

AI & ML interests

Recent Activity

Organizations

John6666's activity