Google DeepMind Launches Gemma 4 12B, a Multimodal AI Model Built to Run on Consumer Laptops

Jun 04, 2026

Google

Summary

Google DeepMind launches Gemma 4 12B, a multimodal AI model that runs locally on consumer laptops with just 16GB of VRAM, featuring a groundbreaking encoder-free architecture that processes vision and audio directly through its LLM backbone for lower latency and memory use, available now on Hugging Face and Kaggle under an Apache 2.0 license.

Key Points

Google DeepMind is launching Gemma 4 12B, a new multimodal AI model designed to run locally on consumer laptops with just 16GB of VRAM, bridging the gap between the smaller E4B and the larger 26B MoE model.
Gemma 4 12B introduces a novel encoder-free architecture that processes vision and audio inputs directly through the LLM backbone, eliminating traditional separate encoders to reduce latency and memory usage.
Released under an Apache 2.0 license, the model is available now on Hugging Face and Kaggle, supports popular developer tools like Ollama and LM Studio, and comes with Multi-Token Prediction drafters to further reduce inference latency.

Google DeepMind Launches Gemma 4 12B, a Multimodal AI Model Built to Run on Consumer Laptops

Summary

Key Points

Tags