NVIDIA Blackwell Slashes AI Token Costs by 35x Over Previous Generation as Data Centers Race to Deploy New Infrastructure

Apr 16, 2026

NVIDIA Blog

Article image for NVIDIA Blackwell Slashes AI Token Costs by 35x Over Previous Generation as Data Centers Race to Deploy New Infrastructure

Summary

NVIDIA's Blackwell architecture is revolutionizing AI infrastructure, delivering 35x lower token costs and 50x greater output per watt than its predecessor Hopper, as major cloud providers race to deploy the new GPUs and redefine how AI computing efficiency is measured.

Key Points

Cost per token is emerging as the definitive metric for evaluating AI infrastructure, replacing outdated input-focused measures like FLOPS per dollar, as data centers shift from traditional workloads to AI token factories producing intelligence at scale.
NVIDIA Blackwell architecture dramatically outperforms the previous Hopper generation, delivering over 50x greater token output per watt and nearly 35x lower cost per million tokens, despite only costing approximately 2x more per GPU per hour.
Leading cloud partners including CoreWeave, Nebius, Nscale, and Together AI are already deploying NVIDIA Blackwell infrastructure, with continuously optimized open-source inference software like vLLM, SGLang, TensorRT-LLM, and NVIDIA Dynamo driving token costs lower over time.

NVIDIA Blackwell Slashes AI Token Costs by 35x Over Previous Generation as Data Centers Race to Deploy New Infrastructure

Summary

Key Points

Tags