Google’s Eighth-Generation TPU Separates “Training” and “Inference” into Dual Chips

This move marks Google’s launch of a new round of competition against Nvidia in the AI hardware field.

“Why move towards dedicated computing power?” Amin Vahdat, Google Senior Vice President and Chief Technology Officer of AI and Infrastructure, stated in an official blog post, “With the rise of AI agents, we believe that providing specialized and optimized chips for training and inference needs will benefit the entire technology ecosystem.”

Currently, AI inference speed has become the core battlefield for major companies. In March of this year, Nvidia heavily promoted a new chip that will be launched soon, which can enable models to quickly respond to user questions, mainly due to the technology acquired in its $20 billion acquisition of chip startup Groq. Against this backdrop, although Google remains a major customer of Nvidia, it is building alternative computing solutions by providing TPUs to cloud service providers.

In fact, it has become a consensus in the industry for tech giants to personally enter the chip-making arena and seek computing power autonomy. By deeply customizing the underlying architecture, companies can maximize the operating efficiency of specific application scenarios. From Apple’s Neural Engine integrated into iPhones for many years, to Microsoft’s second-generation AI chip iterated in January of this year, to Meta’s recent exposure of collaborating with Broadcom to develop multiple AI processors, all confirm this trend.

In this “chip-making movement,” Google is a pioneer. The company began deploying its self-developed AI processors in 2015 and has been providing computing power services to external customers through its cloud platform since 2018. In comparison, Amazon AWS launched its Inferentia chip dedicated to inference and its Trainium processor dedicated to training in 2018 and 2020, respectively.

Investment bank D.A. Davidson analysts estimated in a report last September that the combined valuation of Google’s TPU business and DeepMind AI department is approximately $900 billion.

Currently, Nvidia still dominates the AI computing power market. Google did not directly benchmark against Nvidia’s similar products in this release, but disclosed its own performance iteration data: at the same cost, the performance of the new training chip is 2.8 times that of the seventh-generation TPU (code-named Ironwood) released in November last year, and the performance of the new inference chip has increased by 80%.

It is worth noting that, on the technical route, the industry is unanimously betting on Static Random Access Memory (SRAM). Whether it’s Nvidia’s upcoming Groq 3 LPU or AI chip unicorn Cerebras, which just submitted its IPO application this month, both heavily rely on this technology. Google’s newly launched inference chip TPU 8i also follows this trend, with its SRAM capacity per chip reaching 384MB, three times that of the previous generation Ironwood.

Alphabet CEO Sundar Pichai pointed out in a blog post that the design goal of the new architecture is to “provide massive throughput and low latency, thereby supporting the concurrent operation of millions of AI agents with extremely high cost-effectiveness.”

In terms of terminal applications, Google disclosed that the commercialization of its AI chips is expanding. Among them, market maker Citadel Securities has developed quantitative research software based on TPU; the 17 national laboratories under the US Department of Energy are fully deploying the “AI Co-scientist” system based on this chip. In addition, AI startup Anthropic has committed to calling for Google TPU computing resources of up to several Gigawatts.