This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Overview

Service Overview

GPU Server is a virtualized computing service that allows you to freely allocate and use as much infrastructure resources provided by the server such as CPU, GPU, and memory as needed at the desired time without having to purchase them individually. It is suitable for tasks that require fast computing speed such as AI model experimentation, prediction, and inference in a cloud environment, and allows you to flexibly select and use resources with optimized performance according to task type and scale. GPU Server provides the following features:

Provided Features

  • GPU Server Management: Users can directly manage creation, deletion, and changes as a Self Service from GPU Server provisioning to monitoring and billing through a web-based Console.
  • Provisioning by GPU Quantity: You can configure a virtual server by freely selecting the quantity of H100/A100 GPUs according to project purpose and scale.
  • High Performance GPU Provision: Provides high-performance GPU servers at the physical server level using the Pass-through method.
  • Storage Connection: Provides additional connected storage besides OS disks. You can connect and use Block Storage, File Storage, and Object Storage.
  • Strong Security Application: Protects servers safely by controlling Inbound/Outbound traffic exchanged with external internet or other VPCs (Virtual Private Cloud) through the Security Group service.
  • Monitoring: You can check monitoring information such as CPU, Memory, Disk, and GPU status corresponding to computing resources through the Cloud Monitoring service.
  • Network Setting Management: The server’s subnet/IP can be easily changed from the values set at initial creation. Provides management functionality that allows you to set use/terminate NAT IP as needed.
  • Key Pair Method: Provides a Key Pair method instead of ID/PW access for secure OS access.
  • Image Management: You can create and manage Custom Images and provides sharing functionality between projects.
  • ServiceWatch Service Integration Provision: You can monitor data through the ServiceWatch service.

Components

GPU Server provides GPU, NVSwitch, and NVLink on top of virtualized computing resources.

Warning
  • NVSwitch can be activated and used only for instance types that allocate 8 GPUs to a single GPU Server.

Specifications by GPU Type

GPU (Graphic Processing Unit) plays the role of performing calculations necessary to create images that make up the computer screen and is specialized for parallel processing, enabling it to process large amounts of data quickly, handling large-scale parallel operations such as artificial intelligence (AI) and data analysis.

The following are the specifications of GPU types provided by the GPU Server service.

ItemA100 TypeH100 Type
Service Provision MethodPass-throughPass-through
GPU ArchitectureNVIDIA AmpereNVIDIA Hopper
GPU Memory80 GB80 GB
GPU Transistors54 billion 7N TSMC80 billion 4N TSMC
FP16 Tensor Core (Dense)312 TFLOPs989 TFLOPs
FP8 Tensor Core (Dense)Not supported1,979 TFLOPs
FP4 Tensor Core (Dense)Not supportedNot supported
GPU Memory Bandwidth2,039 GB/s HBM2e3,352 GB/s HBM3
NVLink PerformanceNVLink 3NVLink 4
NVLink Signaling Rate25 GB/s (x12)25 GB/s (x18)
NVSwitch GPU-to-GPU Bandwidth600 GB/s900 GB/s
Total NVSwitch Aggregate Bandwidth4.8 TB/s7.2 TB/s
Table. GPU Type Specifications

Server Type

The server types provided by GPU Server are as follows. For a detailed description of the server types provided by GPU Server, see GPU Server Server Types.

ItemServer TypeCPU vCoreMemory(GB)GPU Quantity
GPU-A100-1g1v16a1162341
GPU-A100-1g1v32a2324682
GPU-A100-1g1v64a4649364
GPU-A100-1g1v128a812818728
GPU-H100-2g2v12h1122341
GPU-H100-2g2v24h2244682
GPU-H100-2g2v48h4489364
GPU-H100-2g2v96h89618728
Table. GPU Server Server Types

OS and GPU Driver Version

The operating systems (OS) supported by GPU Server are as follows:

OSOS VersionGPU Driver Version
Ubuntu22.04535.183.06
Ubuntu24.04570.195.03
RHEL8.10535.183.06
Table. GPU Server OS and GPU Driver Version

Prerequisite Services

This is a service that must be pre-installed before creating this service. Please prepare by referring to the user guide provided in advance.

Service CategoryServiceDescription
NetworkingVPCService that provides independent virtual networks in cloud environment
NetworkingSecurity GroupVirtual firewall that controls server traffic
Table. GPU Server prerequisite services

1 - Server Type

GPU Server Server Type

GPU Server is classified according to the GPU Type provided, and the GPU used in the GPU Server is determined by the server type selected when creating the GPU Server. Please select the server type according to the specifications of the application you want to run on the GPU Server.

The server types supported by the GPU Server are as follows.

GPU-H100-2 g2v12h1
Category
ExampleDetailed description
Server TypeGPU-H100-2Provided server type classification
  • GPU-H100-2
    • GPU-H100 means the provided GPU type
    • 2 means the generation
  • GPU-A100-1
    • GPU-A100 means the provided GPU type
    • 1 means the generation
Server specificationsg2Provided server type classification and generation
  • g2
    • g means GPU server specifications
    • 2 means generation
Server specificationsv12Number of vCores
  • v2: 2 virtual cores
Server specificationsh1GPU type and quantity
  • h1
    • h means GPU-H100
    • 1 means 1 GPU
  • a2
    • a means GPU-A100
    • 2 means 2 GPUs
Table. GPU Server server type format

g1 server type

The g1 server type is a GPU Server that uses NVIDIA A100 Tensor Core GPU, suitable for high-performance applications.

  • Provides up to 8 NVIDIA A100 Tensor Core GPUs
  • Equipped with 6,912 CUDA cores and 432 Tensor cores per GPU
  • Supports up to 128 vCPUs and 1,920 GB of memory
  • Maximum 40 Gbps networking speed
  • 600GB/s GPU and NVIDIA NVSwitch P2P communication
CategoryServer TypeGPUCPUMemoryGPU MemoryNetwork Bandwidth
GPU-A100-1g1v16a1116 vCore234 GB80 GBup to 20 Gbps
GPU-A100-1g1v32a2232 vCore468 GB160 GBup to 20 Gbps
GPU-A100-1g1v64a4464 vCore936 GB320 GBup to 40 Gbps
GPU-A100-1g1v128a88128 vCore1872 GB640 GBMaximum 40 Gbps
Table. GPU Server server type > GPU-A100-1 server type

g2 server type

The g2 server type is a GPU Server that uses NVIDIA H100 Tensor Core GPU, suitable for high-performance applications.

  • Up to 8 NVIDIA H100 Tensor Core GPUs provided
  • Equipped with 16,896 CUDA cores and 528 Tensor cores per GPU
  • Supports up to 96 vCPUs and 1,920 GB of memory
  • Maximum networking speed of 40Gbps
  • 900GB/s GPU and NVIDIA NVSwitch P2P communication
CategoryServer TypeGPUCPUMemoryGPU MemoryNetwork Bandwidth
GPU-H100-2g2v12h1112 vCore234 GB80 GBup to 20 Gbps
GPU-H100-2g2v24h2224 vCore468 GB160 GBup to 20 Gbps
GPU-H100-2g2v48h4448 vCore936 GB320 GBMaximum 40 Gbps
GPU-H100-2g2v96h8896 vCore1872 GB640 GBup to 40 Gbps
Table. GPU Server server type > GPU-H100-2 server type

2 - Monitoring Metrics

GPU Server Monitoring Metrics

The following table shows the monitoring metrics of the GPU Server that can be checked through Cloud Monitoring.

Even without installing an Agent, basic monitoring metrics are provided. Please check the Table. GPU Server Monitoring Metrics (Basic) below. Additionally, metrics that can be retrieved by installing an Agent are referenced in the Table. GPU Server Additional Monitoring Metrics (Agent Installation Required) below.

For detailed Cloud Monitoring usage, please refer to the Cloud Monitoring guide.

Performance Item NameDescriptionUnit
Memory Total [Basic]Total available memory in bytesbytes
Memory Used [Basic]Currently used memory in bytesbytes
Memory Swap In [Basic]Swapped memory in bytesbytes
Memory Swap Out [Basic]Swapped memory in bytesbytes
Memory Free [Basic]Unused memory in bytesbytes
Disk Read Bytes [Basic]Read bytesbytes
Disk Read Requests [Basic]Number of read requestscnt
Disk Write Bytes [Basic]Written bytesbytes
Disk Write Requests [Basic]Number of write requestscnt
CPU Usage [Basic]Average system CPU usage over 1 minute%
Instance State [Basic]Instance statestate
Network In Bytes [Basic]Received bytesbytes
Network In Dropped [Basic]Dropped received packetscnt
Network In Packets [Basic]Received packetscnt
Network Out Bytes [Basic]Sent bytesbytes
Network Out Dropped [Basic]Dropped sent packetscnt
Network Out Packets [Basic]Sent packetscnt
Table. GPU Server Basic Monitoring Metrics (Basic)
Performance Item NameDescriptionUnit
GPU CountNumber of GPUscnt
GPU Memory UsageGPU memory usage rate%
GPU Memory UsedUsed GPU memoryMB
GPU TemperatureGPU temperature
GPU UsageGPU utilization%
GPU Usage [Avg]Average GPU usage rate%
GPU Power CapMaximum power capacity of the GPUW
GPU Power UsageCurrent power usage of the GPUW
GPU Memory Usage [Avg]Average GPU memory usage rate%
GPU Count in useNumber of GPUs in use by jobs on the nodecnt
Execution Status for nvidia-smiExecution result of the nvidia-smi commandstatus
Core Usage [IO Wait]CPU time spent in IO wait state%
Core Usage [System]CPU time spent in system space%
Core Usage [User]CPU time spent in user space%
CPU CoresNumber of CPU cores on the hostcnt
CPU Usage [Active]CPU time used, excluding idle and IO wait states%
CPU Usage [Idle]CPU time spent in idle state%
CPU Usage [IO Wait]CPU time spent in IO wait state%
CPU Usage [System]CPU time used by the kernel%
CPU Usage [User]CPU time used by user space%
CPU Usage/Core [Active]CPU time used per core, excluding idle and IO wait states%
CPU Usage/Core [Idle]CPU time spent in idle state per core%
CPU Usage/Core [IO Wait]CPU time spent in IO wait state per core%
CPU Usage/Core [System]CPU time used by the kernel per core%
CPU Usage/Core [User]CPU time used by user space per core%
Disk CPU Usage [IO Request]CPU time spent on IO requests%
Disk Queue Size [Avg]Average queue length of requestsnum
Disk Read BytesBytes read from the device per secondbytes
Disk Read Bytes [Delta Avg]Average delta of bytes read from the devicebytes
Disk Read Bytes [Delta Max]Maximum delta of bytes read from the devicebytes
Disk Read Bytes [Delta Min]Minimum delta of bytes read from the devicebytes
Disk Read Bytes [Delta Sum]Sum of delta of bytes read from the devicebytes
Disk Read Bytes [Delta]Delta of bytes read from the devicebytes
Disk Read Bytes [Success]Total bytes successfully readbytes
Disk Read RequestsNumber of read requests to the device per secondcnt
Disk Read Requests [Delta Avg]Average delta of read requests to the devicecnt
Disk Read Requests [Delta Max]Maximum delta of read requests to the devicecnt
Disk Read Requests [Delta Min]Minimum delta of read requests to the devicecnt
Disk Read Requests [Delta Sum]Sum of delta of read requests to the devicecnt
Disk Read Requests [Success Delta]Delta of successful read requests to the devicecnt
Disk Read Requests [Success]Total successful read requestscnt
Disk Request Size [Avg]Average size of requests to the devicenum
Disk Service Time [Avg]Average service time of requests to the devicems
Disk Wait Time [Avg]Average wait time of requests to the devicems
Disk Wait Time [Read]Average read wait time of the devicems
Disk Wait Time [Write]Average write wait time of the devicems
Disk Write Bytes [Delta Avg]Average delta of bytes written to the devicebytes
Disk Write Bytes [Delta Max]Maximum delta of bytes written to the devicebytes
Disk Write Bytes [Delta Min]Minimum delta of bytes written to the devicebytes
Disk Write Bytes [Delta Sum]Sum of delta of bytes written to the devicebytes
Disk Write Bytes [Delta]Delta of bytes written to the devicebytes
Disk Write Bytes [Success]Total bytes successfully writtenbytes
Disk Write RequestsNumber of write requests to the device per secondcnt
Disk Write Requests [Delta Avg]Average delta of write requests to the devicecnt
Disk Write Requests [Delta Max]Maximum delta of write requests to the devicecnt
Disk Write Requests [Delta Min]Minimum delta of write requests to the devicecnt
Disk Write Requests [Delta Sum]Sum of delta of write requests to the devicecnt
Disk Write Requests [Success Delta]Delta of successful write requests to the devicecnt
Disk Write Requests [Success]Total successful write requestscnt
Disk Writes BytesBytes written to the device per secondbytes
Filesystem Hang CheckFilesystem hang check (normal: 1, abnormal: 0)status
Filesystem NodesTotal number of filesystem nodescnt
Filesystem Nodes [Free]Total number of available filesystem nodescnt
Filesystem Size [Available]Available disk space in bytesbytes
Filesystem Size [Free]Free disk space in bytesbytes
Filesystem Size [Total]Total disk space in bytesbytes
Filesystem UsageDisk space usage rate%
Filesystem Usage [Avg]Average disk space usage rate%
Filesystem Usage [Inode]Inode usage rate%
Filesystem Usage [Max]Maximum disk space usage rate%
Filesystem Usage [Min]Minimum disk space usage rate%
Filesystem Usage [Total]Total disk space usage rate%
Filesystem UsedUsed disk space in bytesbytes
Filesystem Used [Inode]Used inode space in bytesbytes
Memory FreeTotal available memory in bytesbytes
Memory Free [Actual]Actual available memory in bytesbytes
Memory Free [Swap]Available swap memory in bytesbytes
Memory TotalTotal memory in bytesbytes
Memory Total [Swap]Total swap memory in bytesbytes
Memory UsageMemory usage rate%
Memory Usage [Actual]Actual memory usage rate%
Memory Usage [Cache Swap]Cache swap usage rate%
Memory Usage [Swap]Swap memory usage rate%
Memory UsedUsed memory in bytesbytes
Memory Used [Actual]Actual used memory in bytesbytes
Memory Used [Swap]Used swap memory in bytesbytes
CollisionsNetwork collisionscnt
Network In BytesReceived bytesbytes
Network In Bytes [Delta Avg]Average delta of received bytesbytes
Network In Bytes [Delta Max]Maximum delta of received bytesbytes
Network In Bytes [Delta Min]Minimum delta of received bytesbytes
Network In Bytes [Delta Sum]Sum of delta of received bytesbytes
Network In Bytes [Delta]Delta of received bytesbytes
Network In DroppedDropped received packetscnt
Network In ErrorsReceived errorscnt
Network In PacketsReceived packetscnt
Network In Packets [Delta Avg]Average delta of received packetscnt
Network In Packets [Delta Max]Maximum delta of received packetscnt
Network In Packets [Delta Min]Minimum delta of received packetscnt
Network In Packets [Delta Sum]Sum of delta of received packetscnt
Network In Packets [Delta]Delta of received packetscnt
Network Out BytesSent bytesbytes
Network Out Bytes [Delta Avg]Average delta of sent bytesbytes
Network Out Bytes [Delta Max]Maximum delta of sent bytesbytes
Network Out Bytes [Delta Min]Minimum delta of sent bytesbytes
Network Out Bytes [Delta Sum]Sum of delta of sent bytesbytes
Network Out Bytes [Delta]Delta of sent bytesbytes
Network Out DroppedDropped sent packetscnt
Network Out ErrorsSent errorscnt
Network Out PacketsSent packetscnt
Network Out Packets [Delta Avg]Average delta of sent packetscnt
Network Out Packets [Delta Max]Maximum delta of sent packetscnt
Network Out Packets [Delta Min]Minimum delta of sent packetscnt
Network Out Packets [Delta Sum]Sum of delta of sent packetscnt
Network Out Packets [Delta]Delta of sent packetscnt
Open Connections [TCP]Open TCP connectionscnt
Open Connections [UDP]Open UDP connectionscnt
Port UsagePort usage rate%
SYN Sent SocketsNumber of sockets in SYN_SENT statecnt
Kernel PID MaxMaximum PID valuecnt
Kernel Thread MaxMaximum thread valuecnt
Process CPU UsageCPU time used by the process%
Process CPU Usage/CoreCPU time used by the process per core%
Process Memory UsageResident Set size%
Process Memory UsedUsed memory by the processbytes
Process PIDProcess IDPID
Process PPIDParent process IDPID
Processes [Dead]Number of dead processescnt
Processes [Idle]Number of idle processescnt
Processes [Running]Number of running processescnt
Processes [Sleeping]Number of sleeping processescnt
Processes [Stopped]Number of stopped processescnt
Processes [Total]Total number of processescnt
Processes [Unknown]Number of unknown processescnt
Processes [Zombie]Number of zombie processescnt
Running Process UsageProcess usage rate%
Running ProcessesNumber of running processescnt
Running Thread UsageThread usage rate%
Running ThreadsNumber of running threadscnt
Context SwitchesContext switches per secondcnt
Load/Core [1 min]Load per core over 1 minutecnt
Load/Core [15 min]Load per core over 15 minutescnt
Load/Core [5 min]Load per core over 5 minutescnt
Multipaths [Active]Number of active multipath connectionscnt
Multipaths [Failed]Number of failed multipath connectionscnt
Multipaths [Faulty]Number of faulty multipath connectionscnt
NTP OffsetMeasured offset from the NTP servernum
Run Queue LengthRun queue lengthnum
UptimeSystem uptime in millisecondsms
Context SwitchiesContext switches per secondcnt
Disk Read Bytes [Sec]Bytes read from the device per secondcnt
Disk Read Time [Avg]Average read time from the devicesec
Disk Transfer Time [Avg]Average disk transfer timesec
Disk UsageDisk usage rate%
Disk Write Bytes [Sec]Bytes written to the device per secondcnt
Disk Write Time [Avg]Average write time to the devicesec
Pagingfile UsagePaging file usage rate%
Pool Used [Non Paged]Non-paged pool usagebytes
Pool Used [Paged]Paged pool usagebytes
Process [Running]Number of running processescnt
Threads [Running]Number of running threadscnt
Threads [Waiting]Number of waiting threadscnt
Table. GPU Server Additional Monitoring Metrics (Agent Installation Required)

3 - ServiceWatch Metrics

GPU Server sends metrics to ServiceWatch. The metrics provided by basic monitoring are data collected at 5-minute intervals. When detailed monitoring is enabled, you can view data collected at 1-minute intervals.

Notice
  • GPU Server’s basic monitoring and detailed monitoring are provided with the same metrics as Virtual Server, and the namespace is also provided as Virtual Server.
  • GPU-related metrics are provided through ServiceWatch Agent. For information on how to collect metrics using ServiceWatch Agent, refer to the ServiceWatch Agent guide.
Reference
To view metrics in ServiceWatch, refer to the ServiceWatch guide.

For information on how to enable detailed monitoring for GPU Server, refer to How-to guides > Enable ServiceWatch detailed monitoring.

Basic Metrics

The following are basic metrics for the namespace Virtual Server.

In the table below, metrics with metric names marked in bold are selected as key metrics among the basic metrics provided by Virtual Server. Key metrics are used to configure service dashboards that are automatically built for each service in ServiceWatch.

Each metric guides you on which statistic value is meaningful when querying that metric through the user guide, and the statistic value marked in bold among the meaningful statistics is the key statistic value. In the service dashboard, you can view key metrics through key statistic values.

Performance ItemDetailed DescriptionUnitMeaningful Statistics
Instance StateInstance state display
  • 1 - Active
  • 0 - Off
None
  • Sum
CPU UsageCPU usagePercent
  • Average
  • Maximum
  • Minimum
Disk Read BytesAmount read from block device (bytes)Bytes
  • Sum
  • Average
  • Maximum
  • Minimum
Disk Read RequestsNumber of read requests from block deviceCount
  • Sum
  • Average
  • Maximum
  • Minimum
Disk Write BytesAmount written to block device (bytes)Bytes
  • Sum
  • Average
  • Maximum
  • Minimum
Disk Write RequestsNumber of write requests to block deviceCount
  • Sum
  • Average
  • Maximum
  • Minimum
Network In BytesAmount received on network interface (bytes)Bytes
  • Sum
  • Average
  • Maximum
  • Minimum
Network In DroppedNumber of received packets dropped on network interfaceCount
  • Sum
  • Average
  • Maximum
  • Minimum
Network In PacketsNumber of received packets on network interfaceCount
  • Sum
  • Average
  • Maximum
  • Minimum
Network Out BytesAmount transmitted on network interface (bytes)Bytes
  • Sum
  • Average
  • Maximum
  • Minimum
Network Out DroppedNumber of transmitted packets dropped on network interfaceCount
  • Sum
  • Average
  • Maximum
  • Minimum
Network Out PacketsAmount transmitted on network interfaceCount
  • Sum
  • Average
  • Maximum
  • Minimum
Table. Virtual Server Basic Metrics