Google Releases AI Training and Inference-Specific Chips, Challenging NVIDIA Again

Google Senior Vice President and Chief Technology Officer of AI and Infrastructure, Amin Vahdat, stated in a blog post, “With the rise of AI agents, we believe that chips specialized for training and deployment needs will benefit the industry.”

In March, NVIDIA touted its upcoming chip products, leveraging technology gained from its $20 billion acquisition of chip startup Groq, which allows models to respond quickly to user queries. Google is a major customer of NVIDIA, but also offers TPUs as an alternative to businesses using its cloud services.

Most of the world’s leading tech companies are developing artificial intelligence-specific semiconductors to maximize computational efficiency and meet the needs of specific application scenarios. Apple has been integrating neural engine AI components into its self-designed iPhone chips for years; Microsoft released its second-generation AI chip in January of this year; and last week, Meta announced it is partnering with Broadcom to develop multiple AI processors.

Google is a pioneer in this trend. In 2015, Google began using its self-developed chips to run AI models and opened them up for rent to cloud service customers in 2018. Amazon Web Services launched the Inferentia chip for processing AI requests in 2018 and the Trainium processor for training AI models in 2020.

Investment firm DADavidson analysts estimated last September that the TPU business, combined with the Google DeepMind AI team, is worth approximately $900 billion.

Currently, no tech giant can replace NVIDIA, and Google did not even compare the performance of its new chips to those of the AI chip leader. However, Google says the new training chip is 2.8 times the performance of the seventh-generation Einwood TPU released in November of last year at the same price, and the inference chip’s performance is improved by 80%.

NVIDIA says its upcoming Groq3LPU hardware will use a large amount of static random access memory (SRAM), and AI chip maker Cerebras, which filed for an IPO earlier this month, also uses the technology. Google’s new inference chip, codenamed TPU8i, also features SRAM, with 384MB of SRAM built into a single chip, three times the capacity of the Einwood TPU.

Alphabet CEO Sundar Pichai wrote in a blog post that the chip architecture is designed to “deliver massive throughput and low latency at a high value, meeting the needs of running millions of agents simultaneously.”

The scale of Google AI chip applications is expanding. Google says Castle Securities has built quantitative research software based on Google TPU, all 17 national laboratories of the U.S. Department of Energy are using AI collaborative scientist software developed based on the chip, and AI company Anthropic has pledged to use several gigawatts of Google TPU computing power.