Learning brief
Generated by AI from multiple sources. Always verify critical information.
TL;DR
Google released Gemma 4, a family of open-source AI models that run locally on your device. The 26-billion-parameter model ranks #6 worldwide but only uses 15% of its capacity at once — hitting 300 tokens per second on a Mac Studio. For the first time, you can run frontier-class AI without paying OpenAI or Anthropic a dime.
What changed
Google released four Gemma 4 models (2B to 31B parameters) under Apache 2.0 — fully open-source, runs locally.
Why it matters
The 26B model ranks #6 globally while running on a laptop, outperforming models 20x its size.
What to watch
Whether this forces OpenAI and Anthropic to drop prices or release their own open models.
What Happened
Google dropped four open-source AI models on April 2, 2026: Gemma 4 E2B (2 billion parameters), E4B (4 billion), 26B Mixture of Experts (MoE), and 31B Dense (Source 39). These aren't cloud-only API models — they run entirely on your hardware, from smartphones to laptops, under an Apache 2.0 license that lets you use them for free, modify them, and deploy them anywhere (Source 36).
The standout is the 26B MoE model. Think of it like a team of 26 billion specialized workers, but only 3.8 billion show up for any given task — the rest stay home. This "Mixture of Experts" design means it uses only 15% of its capacity at once, hitting roughly 300 tokens per second on a Mac Studio M2 Ultra (Source 37). That's faster than most people type. Despite its efficient size, it ranks #6 on the Arena AI leaderboard among all open models worldwide, beating systems with 20x more parameters (Source 39).
Google built Gemma 4 from the same research behind Gemini 3 — their flagship cloud AI. But unlike Gemini (which requires a subscription and sends your data to Google's servers), Gemma runs offline. No internet connection required. No chat logs leaving your device. Developers have already downloaded Gemma models over 400 million times since the first version launched, spawning 100,000+ customized variants (Source 39). The 31B Dense model currently ranks #3 globally among open models (Source 39).
The entire family handles 256K token contexts (128K for the smaller E2B/E4B models) — that's roughly 190,000 words, or about three full novels in a single conversation (Source 37). They can now process audio, video, images, and text, and they're designed for "agentic workflows" — AI that plans multi-step tasks and executes them without constant human guidance (Source 36).
So What?
This changes the economics of running AI. If you're a developer building an app, you've had two options until now: pay OpenAI/Anthropic per API call (which scales painfully with users), or run open models that couldn't match GPT-4's quality. Gemma 4 26B closes that gap. A model ranking #6 globally that runs on a $2,000 Mac Studio means you can prototype, test, and even deploy production apps without touching a credit card. For context: running DeepSeek V3 (685B parameters) costs $14,000/month on cloud GPUs (Source 9). Gemma 4 26B runs on hardware you already own.
The offline capability matters more than it looks. Google says Gemma 4 can run on "billions of Android devices" (Source 36). That means your phone could handle complex AI tasks — translation, code generation, document analysis — during a flight, in a remote area, or anywhere you don't want your data uploaded to a server. Someone already built a Chrome extension (Gemma Gem) that loads the 2B model directly in your browser to read webpages, click buttons, and type text — all without sending anything to the cloud (Source 10).
Here's the uncomfortable truth: this is Google's landgrab, not charity. Open-sourcing Gemma 4 under Apache 2.0 isn't about developer love — it's about making Google's AI the default infrastructure layer before OpenAI and Anthropic lock everyone into their ecosystems. If millions of developers build on Gemma and train custom versions, Google controls the standard. That said, developers should take the free lunch. The license is permissive, the models are genuinely capable, and unlike Meta's Llama (which has usage restrictions), Apache 2.0 lets you do anything — including compete with Google.
Sources