NVIDIA DGX B300: Architecture, Features, Specifications, and Ideal Use Cases
Cloud providers and infrastructure platforms continuously work to deliver modern technologies across their environments. This progress spans foundational cloud services such as storage all the way to advanced AI platforms and large-scale compute offerings.
Many AI environments are powered by some of the most capable GPUs currently available, including options such as the NVIDIA H200 and the AMD MI350X. At the same time, these technologies continue to evolve rapidly, with each new hardware generation introducing major performance gains over the last.
NVIDIA is a strong example of this pace of innovation. Previous generations such as the NVIDIA Hopper H100 and NVIDIA H200 already set a high standard, but NVIDIA has since introduced a newer microarchitecture: NVIDIA Blackwell. Compared with Hopper, Blackwell brings major improvements in several key areas. One of the most notable outcomes of this new architecture is the NVIDIA DGX B300 GPU system. It stands among the most powerful AI hardware systems commercially available and is positioned as a high-performance platform for modern AI workloads and large-scale AI factories.
This article takes a detailed look at the NVIDIA DGX B300, covering its technical specifications, highlighting its most important new capabilities, and concluding with guidance on when this system is the right choice. Follow along for an in-depth overview of one of the most talked-about developments in AI infrastructure.
Machine Overview: NVIDIA B300
This section explores the NVIDIA DGX B300 in greater detail. It begins with the component architecture and hardware foundation that make the system so powerful. It then examines the capabilities of its GPUs and the Blackwell microarchitecture, which together push performance beyond previous generations.
NVIDIA B300 Hardware Specs and Architecture Overview
| Category | Specification |
|---|---|
| System | NVIDIA DGX B300 |
| GPUs | 8× NVIDIA Blackwell Ultra SXM |
| CPU | Intel® Xeon® 6776P Processors |
| Total GPU Memory | 2.1 TB |
| Performance | FP4 Tensor Core: 144 PFLOPS (sparse) | 108 PFLOPS (dense) FP8 Tensor Core: 72 PFLOPS (sparse) |
| NVIDIA NVLink™ Switch System | 2× |
| NVIDIA NVLink Bandwidth | 14.4 TB/s aggregate bandwidth |
| Networking | 8× OSFP ports (8× single-port NVIDIA ConnectX-8 VPI, up to 800 Gb/s InfiniBand/Ethernet) 2× dual-port QSFP112 NVIDIA BlueField-3 DPU (up to 400 Gb/s InfiniBand/Ethernet) |
| Management Network | 1GbE onboard NIC with RJ45 1GbE RJ45 Host BMC |
| Storage | OS: 2× 1.9 TB NVMe M.2 Internal: 8× 3.84 TB NVMe E1.S |
| Power Consumption | ~14 kW |
| Software | NVIDIA AI Enterprise NVIDIA Mission Control (with NVIDIA Run:ai) NVIDIA DGX OS |
| Operating System Support | Red Hat Enterprise Linux, Rocky Linux, Ubuntu |
| Rack Units | 10U |
| Support | Three-year business-standard hardware and software support |
The NVIDIA Blackwell DGX B300 is built from a highly advanced collection of technical components. At the center of the system are 8 NVIDIA Blackwell Ultra SXM GPUs combined with Intel® Xeon® 6776P processors. Together, these components provide a total of 2.1 TB of GPU memory, with each GPU equipped with 288 GB of HBM3e memory. In terms of raw compute, the platform reaches 144 PFLOPS (sparse) and 108 PFLOPS (dense) for FP4 Tensor Core operations, while FP8 Tensor Core workloads reach 72 PFLOPS (sparse). All of this is supported by an enormous 14.4 TB/s bandwidth, while power consumption remains at approximately 14 kW.
Architecture
Mounted at the front bezel are 12 3.3 kW AC PSUs positioned above the GPU tray. That tray contains the 8 individual Blackwell Ultra SXM GPUs, which sit above the system memory. At the front section beneath the bezel are 2 BlueField 3 DPUs, M.2 boot drives, self-encrypting drives, and a DC-SCM. At the rear, the unit includes a backplane with 20 connected AC units and the necessary AC power inputs.
Features of the NVIDIA B300
This section highlights several Blackwell GPU and B300-specific capabilities that demonstrate the potential of this system.
NVFP4 Quantization
4-bit quantization lowers the numerical precision of model weights and activations to only four bits, which is a major reduction compared with the more common 16-bit or 32-bit floating-point formats. Blackwell GPUs make it possible to process both inference and training workloads using this lower-precision format. The result is a dramatic increase in speed for training and inference tasks while still preserving a high level of model capability.
Second-Generation Transformer Engine
The second-generation NVIDIA Transformer Engine combines Blackwell-generation Tensor Core hardware with software improvements found in NVIDIA TensorRT-LLM and the NeMo Framework to substantially improve both training and inference for large language models and Mixture-of-Experts architectures. Built on NVIDIA Blackwell Ultra Tensor Cores, the platform provides about twice the acceleration in attention layers and around 1.5 times greater overall AI compute throughput compared with standard Blackwell GPUs. These Tensor Cores also introduce new precision modes, including community-defined microscaling formats, making it possible to replace higher-precision data types without compromising numerical accuracy. Through fine-grained micro-tensor scaling, the Blackwell Transformer Engine efficiently supports 4-bit floating-point (FP4) processing, enabling models to run faster and grow larger within the same memory limits while maintaining strong accuracy.
Decompression Engine
In the past, database and analytics workloads were largely handled by CPUs, but GPU-accelerated data science can significantly improve end-to-end performance by reducing time-to-insight and lowering total processing costs. Current analytics platforms and databases, including Apache Spark, play a central role in collecting, transforming, and querying large-scale datasets. NVIDIA Blackwell improves these workflows through a dedicated Decompression Engine and by accessing the large memory pool of the NVIDIA Grace™ CPU over an ultra-fast interconnect capable of up to 900 GB/s of bidirectional bandwidth. Combined, these technologies speed up the full lifecycle of analytics and database queries while also supporting modern compression standards such as LZ4, Snappy, and Deflate, leading to better throughput and more efficient data handling.
Reliability, Availability, and Serviceability (RAS) Engine
NVIDIA Blackwell strengthens system resilience through a dedicated Reliability, Availability, and Serviceability (RAS) Engine that is designed to identify hardware and software issues before they disrupt production. Using AI-based predictive management, the platform continuously evaluates thousands of telemetry signals across the entire system stack to measure system health and help avoid failures, inefficiencies, and unexpected outages. The RAS Engine also provides detailed diagnostic visibility, making it easier to locate emerging issues, accelerate troubleshooting, and improve maintenance planning. By isolating faults quickly and enabling precise remediation, Blackwell’s intelligent resiliency features help reduce downtime, operational overhead, and wasted energy and compute resources.
Additional Features
| Feature | Description |
|---|---|
| GPU | 8 × NVIDIA B300 Blackwell Ultra GPUs |
| GPU Memory | 8 × 288 GB = 2.3 TB total |
| Performance | 72 PFLOPS FP8 (training) 144 PFLOPS FP4 (inference) |
| NVSwitch | 2 × 5th-generation NVIDIA NVLink™ interconnects |
| CPUs | 2 × Intel® Xeon® Platinum 6776P processors |
| System Memory | 2 TB default (up to 4 TB) |
| Networking Connectivity & Speed | 8 × OSFP ports connected to 8 × NVIDIA® ConnectX®-8 cards (cluster network) 8 × 800 Gb/s InfiniBand/Ethernet 2 × dual-port NVIDIA® BlueField®-3 DPUs (storage & management networks) 2 × 400 Gb/s InfiniBand/Ethernet |
| Cache Storage | 8 × E1.S 3.84 TB NVMe self-encrypting drives |
| Boot Storage | 2 × 1.92 TB M.2 NVMe (software-encryptable) |
| Host Management | On-board 1 GbE RJ-45 Ethernet |
| Remote System Management | Baseboard Management Controller (BMC) 1 GbE RJ-45 network connectivity Remote keyboard, video, mouse (KVM) Remote storage Redfish and IPMI management |
| Operating System | DGX OS 7 based on Ubuntu 24.04 LTS Additional support for Ubuntu, Red Hat Enterprise Linux 8 & 9, and Rocky Linux |
Driven by NVIDIA Blackwell Ultra GPUs, the DGX B300 is designed as an integrated platform for high-throughput large language model inference and training. With as much as 144 petaFLOPS of inference performance, the system provides hyperscale-class AI capability in an enterprise-ready form factor, allowing organizations of different sizes to run real-time, production-level AI workloads. The platform is also built with flexibility in mind, offering multiple power configuration choices and strong performance per watt, which helps position it among the most energy-efficient AI supercomputers currently available. Its updated architecture can also be deployed in NVIDIA MGX racks for the first time, creating a more standardized infrastructure approach that simplifies data center integration while improving efficiency and scalability.
At the heart of the platform are NVIDIA Blackwell GPUs, each constructed with 208 billion transistors using a custom TSMC 4NP manufacturing process and made up of dual reticle-limited dies joined through a unified 10 TB/s chip-to-chip interconnect. Blackwell also introduces high-end security through NVIDIA Confidential Computing, which provides hardware-enforced protection for sensitive data and AI models with minimal performance loss. As the first GPU to support TEE-I/O, Blackwell makes it possible to run secure training, inference, and federated learning workloads while preserving near-native throughput, including across protected NVIDIA NVLink connections. To support AI at exascale levels, fifth-generation NVIDIA NVLink enables fast and balanced communication across as many as 576 GPUs. The NVLink Switch Chip provides up to 130 TB/s of bandwidth within a 72-GPU NVLink domain and extends that same 1.8 TB/s interconnect across multi-node clusters, making it possible to achieve up to nine times the GPU throughput of a single eight-GPU system while maintaining efficient communication through SHARP FP8 acceleration.
When to Use the NVIDIA B300
In summary, this system stands among the most powerful pieces of HPC hardware currently available on the market. Based on that, several key conclusions can be drawn:
- Because it is one of the most powerful commercially available systems today, it is also one of the most capable options for demanding workloads. A very wide range of problems can be handled on this platform, often more quickly than on lower-tier machines.
- NVFP4 makes it especially well suited for low-precision workloads such as large pre-training tasks.
- It is also more expensive to operate than previous-generation GPUs when both initial acquisition cost and energy demands are taken into account.
Based on these points, the NVIDIA B300 can be considered a strong fit for nearly any high-performance AI scenario. It is especially well suited for training or deploying very large AI models. Its increased bandwidth and expanded memory capacity compared with competing options make it an outstanding choice for handling large-scale models.
Closing Thoughts
As shown throughout this article, the NVIDIA B300 marks a major turning point in modern AI infrastructure. It combines exceptional compute density, enormous memory capacity, and advanced architectural innovation to expand what is possible for training and inference at scale. Although its power profile and cost place it firmly in the category of serious enterprise and research-class hardware, teams working at the edge of model size, throughput, and latency can gain extraordinary value from the unmatched per-system capability it delivers.


