Nvidia reigns supreme – Crushes MLPerf 3.0 Benchmarks – AMD Who🤔
What is MLPerf?
MLPerf is a combination of guides presented by highly advanced AI development papers, practical implementation of said AI models & industries using the AI models.
MLPerf keeps launching new tests as Machine Learning / AI hardware, software & services keep evolving.
These tests can be compared with gaming graphics card benchmarking softwares with the same goal in mind. All of them intend to test the hardware to its fullest capabilities to let the end-user/companies have the best product money can buy.

Nvidia H100 Specs
Product Specifications
Form Factor | H100 SXM | H100 PCIe | H100 NVL1 |
FP64 | 34 teraFLOPS | 26 teraFLOPS | 68 teraFLOPs |
FP64 Tensor Core | 67 teraFLOPS | 51 teraFLOPS | 134 teraFLOPs |
FP32 | 67 teraFLOPS | 51 teraFLOPS | 134 teraFLOPs |
TF32 Tensor Core | 989 teraFLOPS2 | 756 teraFLOPS2 | 1,979 teraFLOPs2 |
BFLOAT16 Tensor Core | 1,979 teraFLOPS2 | 1,513 teraFLOPS2 | 3,958 teraFLOPs2 |
FP16 Tensor Core | 1,979 teraFLOPS2 | 1,513 teraFLOPS2 | 3,958 teraFLOPs2 |
FP8 Tensor Core | 3,958 teraFLOPS2 | 3,026 teraFLOPS2 | 7,916 teraFLOPs2 |
INT8 Tensor Core | 3,958 TOPS2 | 3,026 TOPS2 | 7,916 TOPS2 |
GPU memory | 80GB | 80GB | 188GB |
GPU memory bandwidth | 3.35TB/s | 2TB/s | 7.8TB/s3 |
Decoders | 7 NVDEC 7 JPEG | 7 NVDEC 7 JPEG | 14 NVDEC 14 JPEG |
Max thermal design power (TDP) | Up to 700W (configurable) | 300-350W (configurable) | 2x 350-400W (configurable) |
Multi-Instance GPUs | Up to 7 MIGS @ 10GB each | Up to 14 MIGS @ 12GB each | |
Form factor | SXM | PCIe dual-slot air-cooled | 2x PCIe dual-slot air-cooled |
Interconnect | NVLink: 900GB/s PCIe Gen5: 128GB/s | NVLink: 600GB/s PCIe Gen5: 128GB/s | NVLink: 600GB/s PCIe Gen5: 128GB/s |
Server options | NVIDIA HGX H100 Partner and NVIDIA-Certified Systems™ with 4 or 8 GPUs NVIDIA DGX H100 with 8 GPUs | Partner and NVIDIA-Certified Systems with 1–8 GPUs | Partner and NVIDIA-Certified Systems with 2-4 pairs |
NVIDIA AI Enterprise | Add-on | Included | Add-on |
Nvidia H100 Price – It’s costly – Believe it
Nvidia H100 has been quite elusive since the very start. You can get it from various cloud gpu platforms like Azure, Google Cloud, AWS, Vultr, Lambda Labs etc.
Nvidia H100 80GB model comes for a whopping price of $30,000 or INR 2,461,419.00 (24+ Lacs).
Nvidia H100 188GB model could cost about $60,000 or INR 4,922,838.00 (49+ Lacs), as it is two of them stacked via NV Link.
Nvidia DGX Server containing 8 GPUs is said to cost about $520,000 or INR 42,663,660.00 (4.26+ Cr) with 5 Years of support.

Nvidia H100 Benchmarks – As released by MLCommons
MLCommons released the benchmarks of Nvidia H100 & they are staggering. The new GPUs for machine learning and AI by Nvidia dominated every aspect of the tests.
LLM and BERT natural language processing benchmarks were taken on a system developed by Nvida & Inflection AI.
CoreWeave hosted the entire system of GPUs along with related hardware.

The LLM benchmark was taken with OpenAI’s GPT-3 LLM. It was trained with 175 billion parameters.
Lambda Labs had quoted that training such a huge LLM requires about 3.14E23 FLOPS of computing power. It is an expensive & time-consuming task which is certainly affected by how many GPUs are connected & individual efficiency of each GPU.
In comparison to other GPU’s that were included in the tests (none from AMD), H100 definitely set amazing records. Nvidia H100 Tensor Core GPU yielded per-accelerator LLM training time of 548 hours (~ 23 days).

Now, the Nvidia H100s are not for the consumer side, they are for enterprise level training & inference. So, it is safe to assume that H100s would be used in cluster rather than as a single GPU empowering the training & inference.
Just to test how H100 would perform in a cluster setup, Nvidia & Inflection AI co-developed a GPU cluster on Nvidia H100 Tensor Core GPU. It was again hosted & tested by Coreweave.
Nvidia H100 Tensor Core GPU Cluster
The cluster had 3,584 Nvidia H100 accelerators with 896 4th gen Intel Xeon Platinum 8462Y+ process (marketed at $5645 or INR 487,730.77). That’s a really hefty combination for maximum workloads.
The OpenAI’s 175B LLM was benchmarked in just 11 minutes & in comparison to that, another Intel based cluster, took about 311 minutes.
Intel’s hardware included 64-96 Intel Xeon Platinum 8380 processors with 256-389 Intel Habana Gaudi2 accelerators.

It truly puts things in perspective for the companies which are going to implement Nvidia H100 Tensor Core GPU in their machine learning / AI environment.
Nvidia H100 Cloud GPU – Cost & Companies
Unless you already have a profitable AI company or have good funding to support your growing AI business, upgrading to H100 is going to be quite costly.
As mentioned before in this article – Nvidia H100s are going to cost about $30,000 – $50,000 & considering that several big players are trying to snag them up as soon as possible, the inflation would hit it hard.
You will always have a cloud solution for Nvidia H100 Tensor Core GPU.
CoreWeave will lend you H100 GPUs for $2.23/hour ($1605.6 per month or INR 131724.23).
Vultr is currently displaying Contact Sales, for its H100 GPU offerings, in order to get their hourly rates at the time of this blog.
I’ve created a separate blog post which only has information about Nvidia H100 Cloud GPU providers. It is updated as soon as new cloud GPU providers are found. Checkout the link for H100 Cloud GPU providers.
What’s Next!
I am waiting for MLPerfs benchmarks for AMD’s machine learning GPUs, AMD MI300X with its 192 GB of HBM memory for LLMs.
Till then, you can checkout the Top Open Source LLMs currently available on HuggingFace.
Thanks for reading.