Nvidia H100 Benchmarks – Deploy Faster AI Apps – 175B Parameters in 11 Minutes!!

 Nvidia reigns supreme – Crushes MLPerf 3.0 Benchmarks – AMD Who🤔

What is MLPerf?

MLPerf is a combination of guides presented by highly advanced AI development papers, practical implementation of said AI models & industries using the AI models.

MLPerf keeps launching new tests as Machine Learning / AI hardware, software & services keep evolving.

These tests can be compared with gaming graphics card benchmarking softwares with the same goal in mind. All of them intend to test the hardware to its fullest capabilities to let the end-user/companies have the best product money can buy.

Diagram showing the variety of tests included in the MLPerf v3.0 benchmark suite.

Nvidia H100 Specs

Product Specifications

Form FactorH100 SXMH100 PCIeH100 NVL1
FP6434 teraFLOPS26 teraFLOPS68 teraFLOPs
FP64 Tensor Core67 teraFLOPS51 teraFLOPS134 teraFLOPs
FP3267 teraFLOPS51 teraFLOPS134 teraFLOPs
TF32 Tensor Core989 teraFLOPS2756 teraFLOPS21,979 teraFLOPs2
BFLOAT16 Tensor Core1,979 teraFLOPS21,513 teraFLOPS23,958 teraFLOPs2
FP16 Tensor Core1,979 teraFLOPS21,513 teraFLOPS23,958 teraFLOPs2
FP8 Tensor Core3,958 teraFLOPS23,026 teraFLOPS27,916 teraFLOPs2
INT8 Tensor Core3,958 TOPS23,026 TOPS27,916 TOPS2
GPU memory80GB80GB188GB
GPU memory bandwidth3.35TB/s2TB/s7.8TB/s3
Decoders7 NVDEC
7 JPEG
7 NVDEC
7 JPEG
14 NVDEC
14 JPEG
Max thermal design power (TDP)Up to 700W (configurable)300-350W (configurable)2x 350-400W
(configurable)
Multi-Instance GPUsUp to 7 MIGS @ 10GB eachUp to 14 MIGS @ 12GB
each
Form factorSXMPCIe
dual-slot air-cooled
2x PCIe
dual-slot air-cooled
InterconnectNVLink: 900GB/s PCIe Gen5: 128GB/sNVLink: 600GB/s
PCIe Gen5: 128GB/s
NVLink: 600GB/s
PCIe Gen5: 128GB/s
Server optionsNVIDIA HGX H100 Partner and NVIDIA-Certified Systems with 4 or 8 GPUs NVIDIA DGX H100 with 8 GPUsPartner and
NVIDIA-Certified Systems
with 1–8 GPUs
Partner and
NVIDIA-Certified Systems
with 2-4 pairs
NVIDIA AI EnterpriseAdd-onIncludedAdd-on

 

Nvidia H100 Price – It’s costly – Believe it

Nvidia H100 has been quite elusive since the very start. You can get it from various cloud gpu platforms like Azure, Google Cloud, AWS, Vultr, Lambda Labs etc.

Nvidia H100 80GB model comes for a whopping price of $30,000 or INR 2,461,419.00 (24+ Lacs).

Nvidia H100 188GB model could cost about $60,000 or INR 4,922,838.00 (49+ Lacs), as it is two of them stacked via NV Link.

Nvidia DGX Server containing 8 GPUs is said to cost about $520,000 or INR 42,663,660.00 (4.26+ Cr) with 5 Years of support.

Nvidia's AI Chips Priced Over $40,000 on eBay Amid Surging AI Demand -  Gizmochina

Nvidia H100 Benchmarks – As released by MLCommons

MLCommons released the benchmarks of Nvidia H100 & they are staggering. The new GPUs for machine learning and AI by Nvidia dominated every aspect of the tests.

LLM and BERT natural language processing benchmarks were taken on a system developed by Nvida & Inflection AI.

CoreWeave hosted the entire system of GPUs along with related hardware.

Graphic showing NVIDIA's results across a range of MLPerf 3.0 tests.

The LLM benchmark was taken with OpenAI’s GPT-3 LLM. It was trained with 175 billion parameters.

Lambda Labs had quoted that training such a huge LLM requires about 3.14E23 FLOPS of computing power. It is an expensive & time-consuming task which is certainly affected by how many GPUs are connected & individual efficiency of each GPU.

In comparison to other GPU’s that were included in the tests (none from AMD), H100 definitely set amazing records. Nvidia H100 Tensor Core GPU yielded per-accelerator LLM training time of 548 hours (~ 23 days).

Chart showing NVIDIA H100 MLPerf Results across benchmarks.

Now, the Nvidia H100s are not for the consumer side, they are for enterprise level training & inference. So, it is safe to assume that H100s would be used in cluster rather than as a single GPU empowering the training & inference.

Just to test how H100 would perform in a cluster setup, Nvidia & Inflection AI co-developed a GPU cluster on Nvidia H100 Tensor Core GPU. It was again hosted & tested by Coreweave.

Nvidia H100 Tensor Core GPU Cluster

The cluster had 3,584 Nvidia H100 accelerators with 896 4th gen Intel Xeon Platinum 8462Y+ process (marketed at $5645 or INR 487,730.77). That’s a really hefty combination for maximum workloads.

The OpenAI’s 175B LLM was benchmarked in just 11 minutes & in comparison to that, another Intel based cluster, took about 311 minutes.

Intel’s hardware included 64-96 Intel Xeon Platinum 8380 processors with 256-389 Intel Habana Gaudi2 accelerators.

Graphic showing NVIIDA H100 Results across workloads.

It truly puts things in perspective for the companies which are going to implement Nvidia H100 Tensor Core GPU in their machine learning / AI environment.

Nvidia H100 Cloud GPU – Cost & Companies

Unless you already have a profitable AI company or have good funding to support your growing AI business, upgrading to H100 is going to be quite costly.

As mentioned before in this article – Nvidia H100s are going to cost about $30,000 – $50,000 & considering that several big players are trying to snag them up as soon as possible, the inflation would hit it hard.

You will always have a cloud solution for Nvidia H100 Tensor Core GPU.

CoreWeave will lend you H100 GPUs for $2.23/hour ($1605.6 per month or INR 131724.23).

Vultr is currently displaying Contact Sales, for its H100 GPU offerings, in order to get their hourly rates at the time of this blog.

I’ve created a separate blog post which only has information about Nvidia H100 Cloud GPU providers. It is updated as soon as new cloud GPU providers are found. Checkout the link for H100 Cloud GPU providers.

What’s Next!

I am waiting for MLPerfs benchmarks for AMD’s machine learning GPUs, AMD MI300X with its 192 GB of HBM memory for LLMs.

Till then, you can checkout the Top Open Source LLMs currently available on HuggingFace.

Thanks for reading.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top