Nvidia Unveils Blackwell, Its Subsequent GPU

At present at Nvidia’s developer convention, GTC 2024, the corporate revealed its subsequent GPU, the B200. The B200 is able to delivering 4 occasions the coaching efficiency, as much as 30 occasions the inference efficiency, and as much as 25 occasions higher power effectivity, in comparison with its predecessor, the Hopper H100 GPU. Primarily based on the brand new Blackwell structure, the GPU will be mixed with the corporate’s Grace CPUs to kind a brand new technology of DGX SuperPOD computer systems able to as much as 11.5 billion billion floating level operations (exaflops) of AI computing utilizing a brand new, low-precision quantity format.

“Blackwell is a brand new class of AI superchip,” says Ian Buck, Nvidia’s vice chairman of high-performance computing and hyperscale. Nvidia named the GPU structure for mathematician David Harold Blackwell, the primary Black inductee into the U.S. Nationwide Academy of Sciences.

The B200 consists of about 1600 sq. millimeters of processor on two silicon dies which might be linked in the identical bundle by a ten terabyte per second connection, so that they carry out as in the event that they have been a single 208-billion-transistor chip. These slices of silicon are made utilizing TSMC’s N4P chip know-how, which supplies a 6 % efficiency enhance over the N4 know-how used to make Hopper structure GPUs, just like the H100.

Like Hopper chips, the B200 is surrounded by high-bandwidth reminiscence, more and more essential to lowering the latency and power consumption of huge AI fashions. B200’s reminiscence is the most recent selection, HBM3e, and it totals 192 GB (up from 141 GB for the second technology Hopper chip, H200). Moreover, the reminiscence bandwidth is boosted to eight terabytes per second from the H200’s 4.8 TB/s.

Smaller Numbers, Sooner Chips

Chipmaking know-how did a number of the job in making Blackwell, however its what the GPU does with the transistors that basically makes the distinction. In explaining Nvidia’s AI success to laptop scientists final yr at IEEE Sizzling Chips, Nvidia chief scientist Invoice Dally stated that almost all got here from utilizing fewer and fewer bits to signify numbers in AI calculations. Blackwell continues that development.

It’s predecessor structure, Hopper, was the primary occasion of what Nvidia calls the transformer engine. It’s a system that examines every layer of a neural community and determines whether or not it might be computed utilizing lower-precision numbers. Particularly, Hopper can use floating level quantity codecs as small as 8 bits. Smaller numbers are sooner and extra power environment friendly to compute, require much less reminiscence and reminiscence bandwidth, and the logic required to do the maths takes up much less silicon.

“With Blackwell, now we have taken a step additional,” says Buck. The brand new structure has items that do matrix math with floating level numbers simply 4 bits large. What’s extra, it will possibly determine to deploy them on elements of every neural community layer, not simply complete layers like Hopper. “Getting right down to that degree of effective granularity is a miracle in itself,” says Buck.

NVLink and Different Options

Among the many different architectural insights Nvidia revealed about Blackwell are that it incorporates a devoted “engine” dedicated to the GPU’s reliability, availability, and serviceability. In line with Nvidia, it makes use of an AI-based system to run diagnostics and forecast reliability points, with the goal of accelerating up time and serving to huge AI techniques run uninterrupted for weeks at a time, a interval typically wanted to coach giant language fashions.

Nvidia additionally included techniques to assist hold AI fashions safe and to decompress knowledge to hurry database queries and knowledge analytics.

Lastly, Blackwell incorporates Nvidia’s fifth technology laptop interconnect know-how NVLink, which now delivers 1.8 terabytes per second bidirectionally between GPUs and permits for high-speed communication amongst as much as 576 GPUs. Hopper’s model of NVLink might solely attain half that bandwidth.

SuperPOD and Different Computer systems

NVLink’s bandwidth is essential to constructing large-scale computer systems from Blackwell, able to crunching by trillion-parameter neural community fashions.

The bottom computing unit is known as the DGX GB200. Every of these embody 36 GB200 superchips. These are modules that embody a Grace CPU and two Blackwell GPUs, all linked along with NVLink.

The Grace Blackwell superchip is 2 Blackwell GPUs and a Grace CPU in the identical module.Nvidia

Eight DGX GB200s will be linked additional by way of NVLINK to kind a 576-GPU supercomputer referred to as a DGX SuperPOD. Nvidia says such a pc can blast by 11.5 exaflops utilizing 4-bit precision calculations. Programs of tens of 1000’s of GPUs are attainable utilizing the corporate’s Quantum Infiniband networking know-how.

The corporate says to count on SuperPODs and different Nvidia computer systems to change into obtainable later this yr. In the meantime, chip foundry TSMC and digital design automation firm Synopsys every introduced that they’d be shifting Nvidia’s inverse lithography device, cuLitho, into manufacturing. Lastly, the Nvidia introduced a new basis mannequin for humanoid robots referred to as GR00T.