Google just casually disrupted the open-source AI narrative…

FireshipApril 8, 2026

📝Summary

Google has significantly disrupted the open-source AI landscape with the release of **Gemma 4**, a large language model licensed under **Apache 2.0**, offering true freedom for commercial use. This release stands out due to **Gemma 4's** surprisingly small size, enabling deployment on consumer hardware like **GPUs** and even mobile devices, while maintaining performance comparable to much larger models.

The breakthrough in **Gemma 4's** efficiency is attributed to several Google innovations:

* **Turboquant**: A novel quantization technique that compresses model weights more effectively than traditional methods. It achieves this by: * Converting data from Cartesian to polar coordinates, leveraging predictable angle patterns for efficient storage. * Employing the **Johnson-Lynden Strauss transform** to shrink high-dimensional data to single sign bits while preserving data point distances. * **Per-layer embeddings**: Indicated by an "E" in model names like **E2B** and **E4B**, this feature provides each neural network layer with its own contextual "cheat sheet" for tokens. This allows information to be introduced precisely when needed, rather than being carried through the entire model, drastically reducing memory overhead.

In practical terms, a **31 billion parameter Gemma 4** model can be run locally with a **20 GB download** and achieve approximately **10 tokens per second** on an **RTX 4090**. This contrasts sharply with models like **Kimi K2.5**, which require over **600 GB downloads** and extensive hardware for comparable performance. While **Gemma 4** is positioned as a solid all-around model suitable for fine-tuning, it is noted that it may not yet replace high-end coding tools.

📜Full Transcript(8 sections • 5,940 chars)

Last week, Google did something that no other fang company has had the balls to do. That they released a large language model that qualifies as truly free and open source under the Apache 2.0 license. That means free as in total freedom, not open-ish, not research only, not please don't make money or we'll sue you. That model is Gemma 4. And my initial thought was, oh great, another halfbaked open model that's technically free as long as you also own a small data center to run it. But the craziest thing about Gemma 4 is that it's small, like suspiciously small. The big model is small enough to run on a consumer GPU, and the Edge model is small enough to run on your phone or Raspberry Pi, while hitting intelligence levels that are on par with other open models that would normally require data center caliber GPUs just to run. That shouldn't be possible. And in today's

video, we'll find out how it works and look at some other crazy compression techniques developed by Google. It is April 8th, 2026, and you're watching the Code Report. To be fair, several other companies in the Gay Man family have released openweight models, like Meta's Llama models are quasi free and open, but under a special license that gives Meta leverage to any developer that actually starts printing cash with them. Then we have OpenAI's GPT OSS models, which are also Apache 2.0 licensed, but they're bigger and dumber than Gemma. Outside of that, we basically rely on Mistl and the Chinese models like Quen, GLM, Kimmy, and Deepseek. Gemma 4 hits different though because it's made in America. Apache 2.0 licensed, intelligent, and most importantly, tiny.

For comparison, the 31 billion parameter version of Gemma 4 is scoring in the same ballpark as models like Kimmy K2.5 thinking. But here's the absurd part. I can run Gemma 4 locally with a 20 GB download, getting roughly 10 tokens per second on a single RTX 4090. But if I wanted to run Kimmy K 2.5, I'd be looking at a 600 plus GB download, at least 256 GB of RAM, aggressive quantization, and multiple H100s just to get it off the ground. It Kim is still a better model than Gemma, but there's no way in hell I'm going to run it locally.