Back to list
This article was auto-translated.View original (中文)
Tech1mo ago

Google in Talks with Marvell to Develop New AI Inference Chips

Google is reportedly in discussions with Marvell Technology to co-develop two new chips aimed at more efficiently running AI models. One is a memory processing unit designed to work with Google's Tensor Processing Units (TPUs), and the other is a new TPU chip specifically designed for AI model execution.

Google in Talks with Marvell to Develop New AI Inference Chips

These moves highlight the rapidly increasing demand for AI inference chips to support commercial products like agents. Nvidia unveiled a chip at its GTC conference in March designed to improve inference efficiency, called the Language Processing Unit (LPU), based on technology it acquired from Groq for $200 million.

Google has previously purchased data center chips from Marvell, but these were off-the-shelf standard products. This current discussion aims to develop semiconductors custom-designed for Google. This move further indicates Google’s desire to diversify its partnerships and move away from Broadcom, which has long been its sole design partner for TPUs.

According to a 2023 report by The Information, Google considered replacing Marvell with Broadcom as its supplier of network interface chips connecting servers to Ethernet switches within its data centers.

A Google employee revealed that Google had already planned to develop new inference chips, and this work accelerated after Nvidia launched the LPU. Marvell was the chip design partner for Groq’s first-generation LPU, giving it established experience in designing inference chips.

Funda AI previously reported on Google’s discussions with Marvell regarding a new TPU.

Google has previously purchased CXL controller chips from Marvell, which, according to two Google employees, are used to manage memory sharing between servers in data centers. Their past collaboration has given Google confidence in Marvell’s ability to jointly design more new chips.

Two sources said Google’s new memory processing unit will work with TPUs, sharing AI computing tasks with TPUs based on computing power and memory requirements. The two sides plan to complete the design of the memory processing unit as early as next year, followed by pilot production.

They also added that Google plans to produce nearly 2 million memory processing units, but this number could change as negotiations are still in the early stages. By comparison, Morgan Stanley estimates Google will produce approximately 6 million TPUs in 2027. It is currently unclear when the design of the new TPU will be completed, or Google’s planned production scale. The memory processing unit will be compatible with existing TPUs.

Currently, all of Google’s chips are manufactured by TSMC, and it remains unknown whether the new chips will be manufactured by TSMC or another vendor.

For years, Google has only used TPUs internally within its own data centers to support businesses such as Search, YouTube, and the Gemini model, and has only been available to Google Cloud customers. This changed last year when Google began renting out TPUs for use outside of Google data centers, directly challenging Nvidia’s dominance in the AI chip field. Google TPUs have also won favor from customers such as Anthropic, Meta, and Apple.

The rise of dedicated inference chips stems from AI companies launching more complex products like agents, which require far more computing power than traditional AI applications like chatbots.

However, not all inference tasks have the same characteristics. Some parts of generating responses require a lot of computing power, while others are limited by the speed at which chips read and write data in memory. Using different types of inference chips for different tasks, rather than relying on a single processor to perform all computations, has become a key way for AI companies to improve efficiency and reduce costs.

For example, OpenAI recently reached an agreement to purchase over $20 billion in inference chips from Cerebras, a competitor to Nvidia and Groq, and is also using inference chips from other companies. OpenAI is also collaborating with Broadcom to jointly develop its own inference chips.

Marvell’s main business is the design of standard networking, storage, and optical interconnect chips for data centers, and its business of designing custom chips for customers is growing rapidly, becoming its fastest-growing business segment.

Since 2023, Google has been trying to reduce its reliance on Broadcom, mainly because of Broadcom’s high fees. Broadcom charges a fee for each TPU produced, and as TPU demand surges, Google’s payments to Broadcom have also increased.

Last year, Google brought in Taiwanese company MediaTek to participate in the design and production of TPU chips, but Broadcom remains Google’s core chip design partner. Earlier this month, Broadcom and Google signed a new agreement to develop and supply customized TPUs and networking components for Google’s next-generation AI data center cabinets through 2031, indicating that Broadcom still occupies a core position in Google’s chip business.