FPT GPU Cloud Benchmark: Performance Comparison of GPUs for AI & Machine Learning

Benchmarking is crucial for evaluating GPU performance in AI and machine learning. This study measures training speed and scalability across various GPU types to help users choose the best fit for their workloads.

Beyond internal assessments, we compare FPT GPU Cloud’s performance with similar vendors to highlight key advantages in processing power, memory bandwidth, and scalability. These insights help customers select the most efficient GPU cloud services for their AI needs.

Check out the FPT AI Factory’s folk of the Optimum Habana trainer code. These H100 benchmarks may be reproduced by following the provided instructions in that repository.

The following benchmarks utilize Habana’s Optimum Habana v1.7 trainer code to evaluate the performance of the NVIDIA HGX H100 and HGX H200 against similar vendors.

Result H100 (samples per second) – FPT’s Metal Cloud, K8S, DGX, VM. Batch size 54

Model	1 GPU	2 GPUs	3 GPUs	4 GPUs	6 GPUs	8 GPUs
Similar Vendor’s H100 80GB SXM	142.3	275	400.6	521.8	740.3	962.2
Compared to 1 GPU (times faster)		1.93	2.82	3.67	5.20	6.76

Metal Cloud – Bare Metal H100 80GB SXM	144.2	283.4	418.9	550.7	799.4	1056.3
Compared to Similar Vendor	101%	103%	105%	106%	108%	110%
Compared to 1 GPU (times faster)		1.97	2.91	3.82	5.54	7.33

FPT K8S H100 80GB SXM	143.8	282.4	417.0	546.7	792.8	1046.5
Compared to Similar Vendor	101%	103%	104%	105%	107%	109%
Compared to 1 GPU (times faster)		1.96	2.90	3.80	5.51	7.28

DGX H100 80GB SXM	143.8	282.2	417.2	547.7	793.4	1047.0
Compared to Similar Vendor	101%	103%	104%	105%	107%	109%
Compared to 1 GPU (times faster)		1.96	2.90	3.81	5.52	7.28

FPT VM H100 80GB SXM (no nvlink)	143.0	261.7	376.6	459.5
Compared to Similar Vendor	101%	95%	94%	88%
Compared to 1 GPU (times faster)		1.83	2.63	3.21
Compared to Metal Cloud	99%	92%	90%	83%

Result H200 (samples per second) – FPT’s Metal Cloud, multiple batch sizes: 54, 95, 110

Model	1 GPU	2 GPUs	3 GPUs	4 GPUs	6 GPUs	8 GPUs
Metal Cloud – Bare Metal H200 141GB SXM (bz54)	158.8	312.4	460.7	600.9	881.4	1165.1
Compared to Similar Vendor’s H100	112%	114%	115%	115%	119%	121%
Compared to Metal Cloud H100	110%	110%	110%	109%	110%	110%
Compared to Similar Vendor’s Baremetal H200	101%	101%	102%	101%	104%	105%
Compared to 1 GPU (times faster)		1.84	2.71	3.53	5.18	6.85

Metal Cloud – Bare Metal H200 141GB SXM (bz95)	169.4	332.9	489.2	649.7	917.4	1238.1
Compared to Similar Vendor’s H100	119%	121%	122%	125%	124%	129%
Compared to Metal Cloud H100	117%	117%	117%	118%	115%	117%
Compared to Similar Vendor’s Baremetal H200	107%	108%	108%	110%	108%	112%
Compared to 1 GPU (times faster)		1.96	2.87	3.82	5.39	7.28

Metal Cloud – Bare Metal H200 141GB SXM (bz110)	173.9	341.4	505.8	651.0	973.7	1190.0
Compared to Similar Vendor’s H100	122%	124%	126%	125%	132%	124%
Compared to Metal Cloud H100	121%	120%	121%	118%	122%	113%
Compared to Similar Vendor’s Baremetal H200	110%	111%	112%	110%	115%	107%
Compared to 1 GPU (times faster)		2.01	2.97	3.83	5.72	6.99

FPT AI Factory optimizes GPU performance through advanced infrastructure and software enhancements.

Metal Cloud delivers the highest performance across all GPU configurations, outperforming similar vendor benchmarks, with the performance gap increasing as more GPUs are added (up to 110% at 8 GPUs).
FPT K8S performs slightly lower due to additional overhead but remains competitive.
FPT VM (without NVLink) shows lower performance, especially with multiple GPUs, reinforcing NVLink’s role in scaling efficiency.
Across all models, performance scaling is sublinear, with diminishing returns as the number of GPUs increases, though Metal Cloud scales the best (7.33× at 8 GPUs). Meanwhile, the HGX H200, with its larger VRAM (141GB vs. 80GB) and higher memory bandwidth (4.8TB/s vs. 3.35TB/s), enables larger batch sizes and achieves up to 18% better performance than the H100 at maximum batch size.

Learn more about FPT AI Factory’s services HERE.

For more information and consultancy about FPT AI Factory, please contact:

Hotline: 1900 638 399
Email: support@fptcloud.com
Support: m.me/fptsmartcloud

FPT GPU Cloud Benchmark: Performance Comparison of GPUs for AI & Machine Learning

Result H100 (samples per second) – FPT’s Metal Cloud, K8S, DGX, VM. Batch size 54

Result H200 (samples per second) – FPT’s Metal Cloud, multiple batch sizes: 54, 95, 110

Related Posts

Enterprise AI in 2026: 88% Report Positive ROI

Introducing Nemotron Personas Vietnam Dataset: Sovereign AI Grounded in Vietnamese Reality

FPT and NVIDIA Collaborate to Release the Nemotron Personas Vietnam Datasets