Nvidia wants to own your AI data center from end to end

Nvidia.

Follow ZDNET: Add us as a preferred source on Google.


ZDNET’s key takeaways 

  • Nvidia showed off five racks of equipment covering all aspects of AI infrastructure.
  • Nvidia argues that AI economics are better when all the parts are from Nvidia.
  • Nvidia’s broadening ambition includes robotics and even AI in space.

The image Nvidia suggested to the media for its GTC conference in San Jose, Calif., this week is a line of 40 rectangles representing data center server racks of various kinds. No labels, just the racks standing like a bookshelf of the complete works of Shakespeare, or, more ominously, a phalanx of soldiers.

The implicit message of the imposing wall of racks is that Nvidia, if it doesn’t already, will ultimately own all processing in the data center, from one end to the other. 

Also: This OS quietly powers all AI – and most future IT jobs, too

On stage at the show, Nvidia CEO Jensen Huang used Monday’s keynote address to announce a broadening of the company’s chip and system offerings. Existing product lines include the Vera CPU chip, the Rubin GPU chip, and, now, a new kind of rack of equipment joins them, for ultra-fast inference, called the LPX.

A new rack just for AI inference

The LPX rack, which will be available later this year, is made up of chips Nvidia has designed using intellectual property it licensed in December from AI startup Groq for $20 billion. 

The transformed Groq approach, implemented in the Nvidia Groq 3 LPU, will be used in the LPX in combination with Rubin GPUs to achieve an optimal balance between inference speed and the total amount of data that can be handled.

The Groq 3 LPU “can combine the extreme FLOPS [floating-point operations per second] of GPUs and the bandwidth of LPUs into one,” said Ian Buck, Nvidia’s head of hyper-scale and high-performance computing, in a media pre-briefing.

Also: Cloud attacks are getting faster and deadlier – here’s your best defense plan

The original Groq LPU, which stands for “language processing unit,” has 500 megabytes of on-chip SRAM, a form of fast memory much larger than a normal chip memory cache. The SRAM can hold the weights — aka neural parameters — of large language models, as well as the “KV cache,” the intermediate results of calculations that speed up inference.

By using the LPU in a rack alongside GPUs, the LPU’s SRAM can fetch the most-needed data, reducing the need to request data from off-chip DRAM, which GPUs have to do. That local SRAM cache dramatically lowers the latency, the round-trip time to retrieve and output an answer to a query, said Buck.

“Things that took day-long queries are going to be produced in less than an hour,” said Buck.

Changing the economics of AI 

The LPU can also perform query processing much more efficiently, Nvidia claims. Market research firm TechInsights has reported, based on existing Groq silicon prior to the Nvidia deal, that the LPU’s “energy per bit” for memory access is one third of a picojoule, or 20 times less than a GPU’s 6 picojoules to access DRAM.

For the same amount of money per token, Groq LPUs in the LPX rack will deliver 35 times as many tokens per second per megawatt of power, said Buck, using the example of 500,000 tokens processed per second for a price of $45 per million tokens. 

Also: Why you’ll pay more for AI in 2026, and 3 money-saving tips to try

That drastic speed-up in fetching and delivering tokens also leads to a 10-fold increase in the dollars of revenue an AI provider can make per second per megawatt, said Buck.

Though not explicitly mentioned, reducing off-chip DRAM use is increasingly important given that DRAM prices are soaring at the moment. 

Better when you buy it all from us

The LPX rack is part of Huang’s overall pitch to the AI world: that the company offers better economics by selling all parts of the equation — not just the Vera, Rubin, and LPU chips, but also the software that runs on top of them.

“From the five-layer-cake of energy, chips, the infrastructure itself, the models, and the applications, this multi-layer infrastructure is driving the revenue and job creation,” Nvidia’s Buck told reporters. 

The LPX stands in that row of 40 rectangles alongside four other racks that Huang talked about, which make up his company’s pitch for a complete AI infrastructure. 

There is the Vera-Rubin NVL72, a rack made up of 72 Rubin CPUs and 36 Vera CPUs; a new CPU-only rack, the Vera CPU rack, consisting of 256 Vera CPUs and 400 terabytes of DRAM; a new kind of data storage rack, the Bluefield 4 STX that acts as a kind of repository for the KV cache across all GPUs; and the latest version of Nvidia’s Ethernet networking equipment rack, the Spectrum-6 SPX.

Also: Nvidia’s physical AI models clear the way for next-gen robots – here’s what’s new

Buck explained that the Veru CPU racks speed up all the tasks of agentic AI that would be too much for a conventional Intel- or AMD-based x86 CPU.

“GPUs today actually call out to CPUs in order to do the tool calling, SQL query, and the compilation of code,” said Buck. “This sandbox execution is a critical part of both training and deploying agents across the data centers, and those CPUs need to be fast.”

He said the Vera CPU rack can be one and a half times faster on single-threaded CPU tasks versus existing x86 CPUs. As a result, the STX racks will quadruple performance per watt, double pages per second for enterprise data, and deliver five times the tokens per second of context memory required for AI factories running GenTech workflows.

“The results are astounding,” said Buck.

The new data storage rack, explained Buck, is “a high-bandwidth shared layer optimized for storing and retrieving the massive key-value cache data generated by LLMs and GenTech workflows.” Although the rack is made up of Nvidia Bluefield DPU (data-processing units, a companion to CPUs), the STX is only a “reference architecture,” said Buck, meaning that the actual racks will be designed and built by Nvidia partners.

Broadening ambition

The scale and breadth of ambition on display in Huang’s keynote is remarkable. As my colleague Radhika Rajkumar details in her coverage, Huang also talked up its own offering for agentic AI, NemoClaw, and multiple offerings for so-called physical AI, principally robotics. Huang even talked up AI in space, though the details of satellite-based server deployments remain vague, according to Radhika. 

Buck characterized the wall of different servers as “an extreme end-to-end co-design in order to deliver the maximum value out of the AI factory for all of the workloads across AI and all industries.”

Also: Nvidia bets on OpenClaw, but adds a security layer – how NemoClaw works

It is also a canny way for Nvidia to make its value proposition evident to anyone who would consider using competitor AMD’s CPUs and GPUs, or using exotic AI equipment from startup challengers such as Cerebras Systems. With a portfolio of five racks of equipment, spanning all the functions of the data center, Huang is telling customers it will all work more efficiently, and generate more AI revenue, when it’s all supplied by Nvidia. 

For Huang, it is also the culmination of a decades-long quest to take over parts of computing from the incumbents. In the past, he attempted to storm the server CPU market with beefy server CPUs such as Denver. But Huang had to withdraw when the entrenched power of Intel’s Xeon CPU became too much to overcome.

With a bookshelf now of the complete collected parts for a data center, Huang’s company stands poised to define the computing age and overwhelm the companies that defined the prior age.

Artificial Intelligence

Comments (0)
Add Comment