Google has unveiled its first Tensor Processing Unit (TPU) specifically designed for inference, a new era in artificial intelligence where AI agents proactively retrieve and generate data to deliver insights and answers. Meet Ironwood, the seventh-generation TPU that boasts unparalleled performance, scalability, and energy efficiency.
Ironwood represents a significant shift in the development of AI and infrastructure that powers its progress. It’s designed to support the next phase of generative AI, which requires massive computational and communication demands. This new TPU can scale up to 9,216 liquid-cooled chips linked with breakthrough Inter-Chip Interconnect (ICI) networking, spanning nearly 10 MW.
Key features of Ironwood include significant performance gains while focusing on power efficiency, allowing AI workloads to run more cost-effectively. It also offers substantial increases in High Bandwidth Memory (HBM) capacity and dramatically improved HBM bandwidth, reaching 7.2 TBps per chip. Additionally, it boasts enhanced Inter-Chip Interconnect (ICI) bandwidth, enabling faster communication between chips.
With Ironwood, developers can leverage Google’s Pathways software stack to reliably harness the combined computing power of tens of thousands of TPU chips. This new TPU is designed to manage complex computation and communication demands of “thinking models,” which encompass Large Language Models (LLMs), Mixture of Experts (MoEs), and advanced reasoning tasks.
Ironwood comes in two sizes, based on AI workload demands: a 256 chip configuration and a 9,216 chip configuration. When scaled to 9,216 chips per pod for a total of 42.5 Exaflops, Ironwood supports more than 24x the compute power of the world’s largest supercomputer – El Capitan. This represents a monumental leap in AI capability.
With its enhanced SparseCore, Ironwood can process ultra-large embeddings common in advanced ranking and recommendation workloads. The TPU also features an improved Pathways software stack, enabling efficient distributed computing across multiple chips.
Google Cloud is the only hyperscaler with more than a decade of experience in delivering AI compute to support cutting-edge research, seamlessly integrated into planetary-scale services for billions of users every day. With Ironwood, Google has delivered significantly more capacity per watt for customer workloads, making it nearly 30x more power efficient than its first Cloud TPU from 2018.
Ironwood is poised to solve the AI demands of tomorrow by providing increased computation power, memory capacity, ICI networking advancements, and reliability. With its breakthroughs, coupled with a nearly 2x improvement in power efficiency, Ironwood enables Google’s most demanding customers to take on training and serving workloads with the highest performance and lowest latency.
Source: https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference