Gemma3 family is out! Reading the tech report, and this section was really interesting to me from a methods/scientific fairness pov.
Instead of doing over-hyped comparisons, they clearly state that **results are reported in a setup which is advantageous to their models**. (Which everybody does, but people usually don't say)
For a tech report, it makes a lot of sense to report model performance when used optimally! On leaderboards on the other hand, comparison will be apples to apples, but in a potentially unoptimal way for a given model family (like some user interact sub-optimally with models)
Also contains a cool section (6) on training data memorization rate too! Important to see if your model will output the training data it has seen as such: always an issue for privacy/copyright/... but also very much for evaluation!
Because if your model knows its evals by heart, you're not testing for generalization.
Extremely bullish on @CohereForAI's Aya Vision (8B & 32B) - new SOTA open-weight VLMs
- 8B wins up to 81% of the time in its class, better than Gemini Flash - 32B beats Llama 3.2 90B! - Covers 23 languages, excels in image captioning, VQA & more - Integrated on transformers from Day 0!
✨Multiple content modalities (text, images, video thumbnails) ✨Rich user interaction data ( from Xiaohongshu’s 300M+ MAUs, 70%+ search penetration) ✨Comprehensive evaluation metrics ✨Support for RAG system development
✨ 6B with Apache2.0 ✨ Supports Chinese & English Prompts by ANY length ✨ Generate Chinese characters within images ✨ Creates images at any resolution within a given range