The page has been translated by Gen AI.

Overview

Service Overview

Multi-node GPU Cluster is a service that provides physical GPU servers without virtualization for large-scale high-performance AI computation. You can use two or more Bare Metal Servers equipped with GPUs to cluster multiple GPUs, and conveniently operate GPU servers in conjunction with Samsung Cloud Platform’s high‑performance storage and networking services.

Provided Features

The Multi-node GPU Cluster provides the following features.

  • Auto Provisioning and Management: Through the web-based Console, you can easily provision servers of the standard GPU Bare Metal model equipped with 8 GPUs and manage resources and costs.
  • Network Connection: You can cluster multiple GPUs on two or more Bare Metal Servers via high‑speed interconnects, and by configuring a GPU Direct RDMA (Remote Direct Memory Access) environment, you can directly process data I/O between GPU memories, enabling high‑speed AI/Machine Learning computation.
  • Storage Connection: Provides various additional attached storage besides the OS disk. * High-performance SSD NAS File Storage, Block Storage, and Object Storage directly integrated with a high-speed network can also be used together.
  • Network Configuration Management: The server’s subnet/IP can be easily changed from the values set at initial creation. * NAT IP provides a management feature that allows you to enable or disable it as needed.
  • Monitoring: You can view monitoring information for computing resources such as CPU, GPU, Memory, and Disk through Cloud Monitoring. * To use the Cloud Monitoring service of a Multi-node GPU Cluster, you need to install the Agent. * Please install the Agent to ensure stable service. * For more details, please refer to Multi-node GPU Cluster Monitoring Metrics.
  • Terraform Provision: Provides an IaC environment via Terraform.

Component

Multi-node GPU Cluster provides GPUs as a Bare Metal Sever type with standard images and server types. NVSwitch and NVLink are provided.

Specifications by GPU Type

GPU (Graphic Processing Unit) is specialized for parallel operations that process large amounts of data quickly, enabling large-scale parallel computation in fields such as artificial intelligence (AI) and data analysis.

The following are the specifications of GPU types offered by the Multi-node GPU Cluster service.

CategoryH100 TypeB300 Type
GPU ArchitectureNVIDIA HopperNVIDIA Blackwell Ultra
GPU Memory80 GiB268 GiB
GPU Transistors80 billion 4N TSMC208 billion 4NP TSMC
FP16 Tensor Core (Dense)989 TFLOPs2.25 PFLOPs
FP8 Tensor Core (Dense)1979 TFLOPs4.5 PFLOPs
FP4 Tensor Core (Dense)Not supported13.5 PFLOPs
GPU Memory Bandwidth3,352 GB/s HBM38 TB/s HBM3e
NVLink performanceNVLink 4NVLink 5
NVLink Signaling Rate25 GB/s (x18)50 GB/s (x18)
NVSwitch GPU-to-GPU bandwidth900 GB/s1.8 TB/s
Total NVSwitch aggregate bandwidth7.2 TB/s14.4 TB/s
Table. GPU Type specifications

OS and GPU driver version

The operating systems (OS) supported by the Multi-node GPU Cluster are as follows.

OSOS versionGPU driver version
Ubuntu22.04535.86.10, 535.183.06
Ubuntu24.04580.105.08
Table. Multi-node GPU Cluster OS and GPU driver version

Server type

The format of server types provided by the Multi-node GPU Cluster is as follows.

  • Example: when the server type is g2c96h8_metal
CategoryexampleDetailed description
Server generationg2Provided server generation
  • g2: g means GPU server, and 2 means generation
CPUc96Number of cores
  • c96: Allocated cores are physical cores
GPUh8GPU type and quantity
  • h8: h means GPU type, and 8 means GPU quantity
Table. Multi-node GPU Cluster server type format
Reference
For detailed information about the server types provided by Multi-node GPU Cluster, refer to Multi-node GPU Cluster Server Types.

Preceding Service

This is a list of services that must be pre-configured before creating the service. Please refer to the guide provided for each service and prepare in advance.

Service CategoryserviceDetailed description
NetworkingVPCA service that provides an isolated virtual network in a cloud environment
Table. Multi-node GPU Cluster preliminary service
Release Note
Server type