Nishith Jain's picture

Nishith Jain

KingNish

AI & ML interests

AI is fun actually. Busy till June 2025.

Recent Activity

Organizations

Wikimedia's profile picture OpenGVLab's profile picture Blog-explorers's profile picture Multi🤖Transformers's profile picture The Collectionists's profile picture HelpingAI's profile picture ZeroGPU Explorers's profile picture Project Fluently's profile picture Poscye's profile picture INNOVA AI's profile picture Narra's profile picture Social Post Explorers's profile picture Cognitive Computations's profile picture Dev Mode Explorers's profile picture Stable Diffusion Community (Unofficial, Non-profit)'s profile picture ONNX Community's profile picture HF中国镜像站 Discord Community's profile picture Nerdy Face's profile picture grafite's profile picture None yet's profile picture Project R's profile picture Doge Face's profile picture

KingNish's activity

reacted to AdinaY's post with 🔥🤗 1 day ago
reacted to burtenshaw's post with 🤗 1 day ago
view post
Post
1032
everybody and their dog is fine-tuning Gemma 3 today, so I thought I'd do a longer post on the tips and sharp edges I find. let's go!

1. has to be install everything form main and nightly. this is what I'm working with to get unsloth and TRL running

git+https://github.com/huggingface/transformers@main
git+https://github.com/huggingface/trl.git@main
bitsandbytes
peft


plus this with --no-deps

git+https://github.com/unslothai/unsloth-zoo.git@nightly
git+https://github.com/unslothai/unsloth.git@nightly


2. will brown's code to turn GSM8k into a reasoning dataset is a nice toy experiment https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb

3. with a learning rate of 5e-6 rewards and loss stayed flat for the first 100 or so steps.

4. so far none of my runs have undermined the outputs after 1 epoch. therefore, I'm mainly experimenting with bigger LoRA adapters.

from trl import GRPOConfig

training_args = GRPOConfig(
    learning_rate = 5e-6,
    adam_beta1 = 0.9,
    adam_beta2 = 0.99,
    weight_decay = 0.1,
    warmup_ratio = 0.1,
    lr_scheduler_type = "cosine",
    optim = "adamw_8bit",
    logging_steps = 1,
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 1,
    num_generations = 2,
    max_prompt_length = 256,
    max_completion_length = 1024 - 256,
    num_train_epochs = 1,
    max_steps = 250,
    save_steps = 250,
    max_grad_norm = 0.1,
    report_to = "none",
)


5. vision fine-tuning isn't available in TRL's GRPOTrainer, so stick to text datasets. but no need to load the model differently in transformers or Unsloth

from transformers import AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained("google/gemma-3-4b-it)


if you want an introduction to GRPO, check out the reasoning course, it walks you through the algorithm, theory, and implementation in a smooth way.

https://huggingface.co/reasoning-course
  • 2 replies
·
reacted to thomwolf's post with 🔥 2 days ago
view post
Post
1942
We've kept pushing our Open-R1 project, an open initiative to replicate and extend the techniques behind DeepSeek-R1.

And even we were mind-blown by the results we got with this latest model we're releasing: ⚡️OlympicCoder ( open-r1/OlympicCoder-7B and open-r1/OlympicCoder-32B)

It's beating Claude 3.7 on (competitive) programming –a domain Anthropic has been historically really strong at– and it's getting close to o1-mini/R1 on olympiad level coding with just 7B parameters!

And the best part is that we're open-sourcing all about its training dataset, the new IOI benchmark, and more in our Open-R1 progress report #3: https://huggingface.co/blog/open-r1/update-3

Datasets are are releasing:
- open-r1/codeforces
- open-r1/codeforces-cots
- open-r1/ioi
- open-r1/ioi-test-cases
- open-r1/ioi-sample-solutions
- open-r1/ioi-cots
- open-r1/ioi-2024-model-solutions
reacted to BrigitteTousi's post with 🤗 2 days ago
view post
Post
3604
Regardless of X being down or not, so glad I can rely on HF Posts for AI news ❤️🤗
  • 1 reply
·
reacted to Smooke's post with 👍 2 days ago
view post
Post
1797
Hallucinations Blog Research Reading List:

Hallucinations Are A Feature of AI, Humans Are The Bug https://hackernoon.com/hallucinations-are-a-feature-of-ai-humans-are-the-bug

Overcome LLM Hallucinations Using Knowledge Bases https://hackernoon.com/overcome-llm-hallucinations-using-knowledge-bases

How to Detect and Minimise Hallucinations in AI Models https://hackernoon.com/how-to-detect-and-minimise-hallucinations-in-ai-models

Predictive Coding, AI: Modeling Placebos in RCTs for Psychedelics and Antidepressants https://hackernoon.com/predictive-coding-ai-modeling-placebos-in-rcts-for-psychedelics-and-antidepressants

A Simple Method to Improving the Accuracy of Your RAG System https://hackernoon.com/say-goodbye-to-ai-hallucinations-a-simple-method-to-improving-the-accuracy-of-your-rag-system

Gen AI Hallucinations: The Good, the Bad, and the Costly https://hackernoon.com/gen-ai-hallucinations-the-good-the-bad-and-the-costly

Why Do LLMs Hallucinate? https://hackernoon.com/why-do-llms-hallucinate

Truth Serum For The AI Age: Factiverse To Fight Fake News And Hallucinations https://hackernoon.com/truth-serum-for-the-ai-age-factiverse-to-fight-fake-news-and-hallucinations

A Secret Technique To Sidestepping LLM Hallucinations https://hackernoon.com/a-secret-technique-to-sidestepping-llm-hallucinations

The Importance of Explainability in AI (XAI) https://hackernoon.com/tackling-ai-hallucinations-the-importance-of-explainability-in-ai-xai

What You Need to Know About Amazon Bedrock’s RAG Evaluation and LLM-as-a-Judge for Advancing AI https://hackernoon.com/what-you-need-to-know-about-amazon-bedrocks-rag-evaluation-and-llm-as-a-judge-for-advancing-ai

I Over Relied on AI and Those Shortcuts Cost Me https://hackernoon.com/i-over-relied-on-ai-and-those-shortcuts-cost-me

AI’s Non-Determinism, Hallucinations, And... Cats? https://hackernoon.com/ais-non-determinism-hallucinations-and-cats

More to read --> https://hackernoon.com/search?query=hallucinations

reacted to JingzeShi's post with 🚀❤️ 4 days ago
reacted to BlinkDL's post with 🔥 4 days ago
view post
Post
4989
RWKV-7 "Goose" 0.4B trained w/ ctx4k automatically extrapolates to ctx32k+, and perfectly solves NIAH ctx16k 🤯 100% RNN and attention-free. Only trained on the Pile. No finetuning. Replicable training runs. tested by our community: https://github.com/Jellyfish042/LongMamba