China just built a CPU-only AI monster because Nvidia GPUs remain banned

Huawei-linked LineShine supercomputer crams 2.45 million Arm cores into one enormous AI cluster
Huawei’s processors power one of China’s largest AI computing installations today
CPU-only supercomputers eliminate costly data transfers between processors and accelerators during workloads

China has deployed a massive CPU-only supercomputer called LineShine that delivers 1.54 exaflops of AI training performance without using any GPUs at all.

The system packs 20,480 compute nodes, each containing two LX2 processors for a total of 40,960 chips across the entire machine.

Each LX2 processor has 304 CPU cores, meaning the whole supercomputer uses roughly 2.45 million Armv9 cores in total.

Inside the LX2 processor’s unusual architecture

The processor was developed by Huawei or through a joint design with China’s National Supercomputing Center, though the exact origin remains undisclosed.

Each LX2 processor uses two compute chiplets with cores organized into eight clusters containing 38 cores per cluster.

Every core includes ARM’s Scalable Vector Extension and Scalable Matrix Extension units that accelerate matrix operations used in AI training.

The processor delivers 60.3 teraflops of FP64 performance, 240 teraflops of BF16 throughput, and 960 teraops of INT8 performance from a single chip.

Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!

The memory subsystem combines 32GB of on-package HBM delivering up to 4TB/s of bandwidth with up to 256GB of off-package DDR5 memory.

CPU-only systems offer several advantages for complex scientific tasks that combine AI training with massive data ingestion and preprocessing.

Since everything runs on the same processor and memory space, they avoid costly and bandwidth-hungry CPU-to-GPU data transfers.

Homogeneous CPU-based systems can also expose much larger coherent memory pools by combining HBM with large DDR capacities.

This is useful for handling massive scientific datasets, retrieval augmented generation, and long context windows that GPU memory limitations cannot accommodate easily.

The big caveat that comes with this approach

CPU-only systems are usually less power efficient and deliver lower-density AI throughput than GPU-based supercomputers.

This is the major reason most of the industry bets on heterogeneous CPU plus GPU architectures for large-scale AI workloads.

China is pursuing this path largely due to US bans on GPU exports, not because CPU-only systems are technically superior for AI tasks.

The LineShine shows that CPUs can successfully perform GPU jobs, but the efficiency gap between the two approaches remains substantial and unlikely to close anytime soon.

China is making a strategic trade-off, accepting lower performance and higher power consumption in exchange for independence from foreign hardware and software ecosystems like Nvidia’s GPUs and CUDA.

Whether that trade-off makes sense for long-term AI development depends entirely on how quickly Chinese manufacturers can close the performance gap with their own GPU designs.

Until then, the LineShine will remain a remarkable engineering achievement and a practical necessity, but probably not a blueprint for how most of the world will build AI supercomputers.

Via Toms Hardware

Google logo on a black background next to text reading 'Click to follow TechRadar'

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds.