This is the multi-page printable view of this section. Click here to print.
1 - Server Type
GPU Server Server Type
GPU Server is classified according to the GPU Type provided, and the GPU used in the GPU Server is determined by the server type selected when creating the GPU Server. Please select the server type according to the specifications of the application you want to run on the GPU Server.
The server types supported by the GPU Server are as follows.
GPU-H100-2 g2v12h1
Category | Example | Detailed description |
|---|---|---|
| Server Type | GPU-H100-2 | Provided server type classification
|
| Server specifications | g2 | Provided server type classification and generation
|
| Server specifications | v12 | Number of vCores
|
| Server specifications | h1 | GPU type and quantity
|
g1 server type
The g1 server type is a GPU Server that uses NVIDIA A100 Tensor Core GPU, suitable for high-performance applications.
- Provides up to 8 NVIDIA A100 Tensor Core GPUs
- Equipped with 6,912 CUDA cores and 432 Tensor cores per GPU
- Supports up to 128 vCPUs and 1,920 GB of memory
- Maximum 40 Gbps networking speed
- 600GB/s GPU and NVIDIA NVSwitch P2P communication
| Category | Server Type | GPU | CPU | Memory | GPU Memory | Network Bandwidth |
|---|---|---|---|---|---|---|
| GPU-A100-1 | g1v16a1 | 1 | 16 vCore | 234 GB | 80 GB | up to 20 Gbps |
| GPU-A100-1 | g1v32a2 | 2 | 32 vCore | 468 GB | 160 GB | up to 20 Gbps |
| GPU-A100-1 | g1v64a4 | 4 | 64 vCore | 936 GB | 320 GB | up to 40 Gbps |
| GPU-A100-1 | g1v128a8 | 8 | 128 vCore | 1872 GB | 640 GB | Maximum 40 Gbps |
g2 server type
The g2 server type is a GPU Server that uses NVIDIA H100 Tensor Core GPU, suitable for high-performance applications.
- Up to 8 NVIDIA H100 Tensor Core GPUs provided
- Equipped with 16,896 CUDA cores and 528 Tensor cores per GPU
- Supports up to 96 vCPUs and 1,920 GB of memory
- Maximum networking speed of 40Gbps
- 900GB/s GPU and NVIDIA NVSwitch P2P communication
| Category | Server Type | GPU | CPU | Memory | GPU Memory | Network Bandwidth |
|---|---|---|---|---|---|---|
| GPU-H100-2 | g2v12h1 | 1 | 12 vCore | 234 GB | 80 GB | up to 20 Gbps |
| GPU-H100-2 | g2v24h2 | 2 | 24 vCore | 468 GB | 160 GB | up to 20 Gbps |
| GPU-H100-2 | g2v48h4 | 4 | 48 vCore | 936 GB | 320 GB | Maximum 40 Gbps |
| GPU-H100-2 | g2v96h8 | 8 | 96 vCore | 1872 GB | 640 GB | up to 40 Gbps |
2 - Monitoring Metrics
GPU Server Monitoring Metrics
The following table shows the monitoring metrics of the GPU Server that can be checked through Cloud Monitoring.
Even without installing an Agent, basic monitoring metrics are provided. Please check the Table. GPU Server Monitoring Metrics (Basic) below. Additionally, metrics that can be retrieved by installing an Agent are referenced in the Table. GPU Server Additional Monitoring Metrics (Agent Installation Required) below.
For detailed Cloud Monitoring usage, please refer to the Cloud Monitoring guide.
| Performance Item Name | Description | Unit |
|---|---|---|
| Memory Total [Basic] | Total available memory in bytes | bytes |
| Memory Used [Basic] | Currently used memory in bytes | bytes |
| Memory Swap In [Basic] | Swapped memory in bytes | bytes |
| Memory Swap Out [Basic] | Swapped memory in bytes | bytes |
| Memory Free [Basic] | Unused memory in bytes | bytes |
| Disk Read Bytes [Basic] | Read bytes | bytes |
| Disk Read Requests [Basic] | Number of read requests | cnt |
| Disk Write Bytes [Basic] | Written bytes | bytes |
| Disk Write Requests [Basic] | Number of write requests | cnt |
| CPU Usage [Basic] | Average system CPU usage over 1 minute | % |
| Instance State [Basic] | Instance state | state |
| Network In Bytes [Basic] | Received bytes | bytes |
| Network In Dropped [Basic] | Dropped received packets | cnt |
| Network In Packets [Basic] | Received packets | cnt |
| Network Out Bytes [Basic] | Sent bytes | bytes |
| Network Out Dropped [Basic] | Dropped sent packets | cnt |
| Network Out Packets [Basic] | Sent packets | cnt |
| Performance Item Name | Description | Unit |
|---|---|---|
| GPU Count | Number of GPUs | cnt |
| GPU Memory Usage | GPU memory usage rate | % |
| GPU Memory Used | Used GPU memory | MB |
| GPU Temperature | GPU temperature | ℃ |
| GPU Usage | GPU utilization | % |
| GPU Usage [Avg] | Average GPU usage rate | % |
| GPU Power Cap | Maximum power capacity of the GPU | W |
| GPU Power Usage | Current power usage of the GPU | W |
| GPU Memory Usage [Avg] | Average GPU memory usage rate | % |
| GPU Count in use | Number of GPUs in use by jobs on the node | cnt |
| Execution Status for nvidia-smi | Execution result of the nvidia-smi command | status |
| Core Usage [IO Wait] | CPU time spent in IO wait state | % |
| Core Usage [System] | CPU time spent in system space | % |
| Core Usage [User] | CPU time spent in user space | % |
| CPU Cores | Number of CPU cores on the host | cnt |
| CPU Usage [Active] | CPU time used, excluding idle and IO wait states | % |
| CPU Usage [Idle] | CPU time spent in idle state | % |
| CPU Usage [IO Wait] | CPU time spent in IO wait state | % |
| CPU Usage [System] | CPU time used by the kernel | % |
| CPU Usage [User] | CPU time used by user space | % |
| CPU Usage/Core [Active] | CPU time used per core, excluding idle and IO wait states | % |
| CPU Usage/Core [Idle] | CPU time spent in idle state per core | % |
| CPU Usage/Core [IO Wait] | CPU time spent in IO wait state per core | % |
| CPU Usage/Core [System] | CPU time used by the kernel per core | % |
| CPU Usage/Core [User] | CPU time used by user space per core | % |
| Disk CPU Usage [IO Request] | CPU time spent on IO requests | % |
| Disk Queue Size [Avg] | Average queue length of requests | num |
| Disk Read Bytes | Bytes read from the device per second | bytes |
| Disk Read Bytes [Delta Avg] | Average delta of bytes read from the device | bytes |
| Disk Read Bytes [Delta Max] | Maximum delta of bytes read from the device | bytes |
| Disk Read Bytes [Delta Min] | Minimum delta of bytes read from the device | bytes |
| Disk Read Bytes [Delta Sum] | Sum of delta of bytes read from the device | bytes |
| Disk Read Bytes [Delta] | Delta of bytes read from the device | bytes |
| Disk Read Bytes [Success] | Total bytes successfully read | bytes |
| Disk Read Requests | Number of read requests to the device per second | cnt |
| Disk Read Requests [Delta Avg] | Average delta of read requests to the device | cnt |
| Disk Read Requests [Delta Max] | Maximum delta of read requests to the device | cnt |
| Disk Read Requests [Delta Min] | Minimum delta of read requests to the device | cnt |
| Disk Read Requests [Delta Sum] | Sum of delta of read requests to the device | cnt |
| Disk Read Requests [Success Delta] | Delta of successful read requests to the device | cnt |
| Disk Read Requests [Success] | Total successful read requests | cnt |
| Disk Request Size [Avg] | Average size of requests to the device | num |
| Disk Service Time [Avg] | Average service time of requests to the device | ms |
| Disk Wait Time [Avg] | Average wait time of requests to the device | ms |
| Disk Wait Time [Read] | Average read wait time of the device | ms |
| Disk Wait Time [Write] | Average write wait time of the device | ms |
| Disk Write Bytes [Delta Avg] | Average delta of bytes written to the device | bytes |
| Disk Write Bytes [Delta Max] | Maximum delta of bytes written to the device | bytes |
| Disk Write Bytes [Delta Min] | Minimum delta of bytes written to the device | bytes |
| Disk Write Bytes [Delta Sum] | Sum of delta of bytes written to the device | bytes |
| Disk Write Bytes [Delta] | Delta of bytes written to the device | bytes |
| Disk Write Bytes [Success] | Total bytes successfully written | bytes |
| Disk Write Requests | Number of write requests to the device per second | cnt |
| Disk Write Requests [Delta Avg] | Average delta of write requests to the device | cnt |
| Disk Write Requests [Delta Max] | Maximum delta of write requests to the device | cnt |
| Disk Write Requests [Delta Min] | Minimum delta of write requests to the device | cnt |
| Disk Write Requests [Delta Sum] | Sum of delta of write requests to the device | cnt |
| Disk Write Requests [Success Delta] | Delta of successful write requests to the device | cnt |
| Disk Write Requests [Success] | Total successful write requests | cnt |
| Disk Writes Bytes | Bytes written to the device per second | bytes |
| Filesystem Hang Check | Filesystem hang check (normal: 1, abnormal: 0) | status |
| Filesystem Nodes | Total number of filesystem nodes | cnt |
| Filesystem Nodes [Free] | Total number of available filesystem nodes | cnt |
| Filesystem Size [Available] | Available disk space in bytes | bytes |
| Filesystem Size [Free] | Free disk space in bytes | bytes |
| Filesystem Size [Total] | Total disk space in bytes | bytes |
| Filesystem Usage | Disk space usage rate | % |
| Filesystem Usage [Avg] | Average disk space usage rate | % |
| Filesystem Usage [Inode] | Inode usage rate | % |
| Filesystem Usage [Max] | Maximum disk space usage rate | % |
| Filesystem Usage [Min] | Minimum disk space usage rate | % |
| Filesystem Usage [Total] | Total disk space usage rate | % |
| Filesystem Used | Used disk space in bytes | bytes |
| Filesystem Used [Inode] | Used inode space in bytes | bytes |
| Memory Free | Total available memory in bytes | bytes |
| Memory Free [Actual] | Actual available memory in bytes | bytes |
| Memory Free [Swap] | Available swap memory in bytes | bytes |
| Memory Total | Total memory in bytes | bytes |
| Memory Total [Swap] | Total swap memory in bytes | bytes |
| Memory Usage | Memory usage rate | % |
| Memory Usage [Actual] | Actual memory usage rate | % |
| Memory Usage [Cache Swap] | Cache swap usage rate | % |
| Memory Usage [Swap] | Swap memory usage rate | % |
| Memory Used | Used memory in bytes | bytes |
| Memory Used [Actual] | Actual used memory in bytes | bytes |
| Memory Used [Swap] | Used swap memory in bytes | bytes |
| Collisions | Network collisions | cnt |
| Network In Bytes | Received bytes | bytes |
| Network In Bytes [Delta Avg] | Average delta of received bytes | bytes |
| Network In Bytes [Delta Max] | Maximum delta of received bytes | bytes |
| Network In Bytes [Delta Min] | Minimum delta of received bytes | bytes |
| Network In Bytes [Delta Sum] | Sum of delta of received bytes | bytes |
| Network In Bytes [Delta] | Delta of received bytes | bytes |
| Network In Dropped | Dropped received packets | cnt |
| Network In Errors | Received errors | cnt |
| Network In Packets | Received packets | cnt |
| Network In Packets [Delta Avg] | Average delta of received packets | cnt |
| Network In Packets [Delta Max] | Maximum delta of received packets | cnt |
| Network In Packets [Delta Min] | Minimum delta of received packets | cnt |
| Network In Packets [Delta Sum] | Sum of delta of received packets | cnt |
| Network In Packets [Delta] | Delta of received packets | cnt |
| Network Out Bytes | Sent bytes | bytes |
| Network Out Bytes [Delta Avg] | Average delta of sent bytes | bytes |
| Network Out Bytes [Delta Max] | Maximum delta of sent bytes | bytes |
| Network Out Bytes [Delta Min] | Minimum delta of sent bytes | bytes |
| Network Out Bytes [Delta Sum] | Sum of delta of sent bytes | bytes |
| Network Out Bytes [Delta] | Delta of sent bytes | bytes |
| Network Out Dropped | Dropped sent packets | cnt |
| Network Out Errors | Sent errors | cnt |
| Network Out Packets | Sent packets | cnt |
| Network Out Packets [Delta Avg] | Average delta of sent packets | cnt |
| Network Out Packets [Delta Max] | Maximum delta of sent packets | cnt |
| Network Out Packets [Delta Min] | Minimum delta of sent packets | cnt |
| Network Out Packets [Delta Sum] | Sum of delta of sent packets | cnt |
| Network Out Packets [Delta] | Delta of sent packets | cnt |
| Open Connections [TCP] | Open TCP connections | cnt |
| Open Connections [UDP] | Open UDP connections | cnt |
| Port Usage | Port usage rate | % |
| SYN Sent Sockets | Number of sockets in SYN_SENT state | cnt |
| Kernel PID Max | Maximum PID value | cnt |
| Kernel Thread Max | Maximum thread value | cnt |
| Process CPU Usage | CPU time used by the process | % |
| Process CPU Usage/Core | CPU time used by the process per core | % |
| Process Memory Usage | Resident Set size | % |
| Process Memory Used | Used memory by the process | bytes |
| Process PID | Process ID | PID |
| Process PPID | Parent process ID | PID |
| Processes [Dead] | Number of dead processes | cnt |
| Processes [Idle] | Number of idle processes | cnt |
| Processes [Running] | Number of running processes | cnt |
| Processes [Sleeping] | Number of sleeping processes | cnt |
| Processes [Stopped] | Number of stopped processes | cnt |
| Processes [Total] | Total number of processes | cnt |
| Processes [Unknown] | Number of unknown processes | cnt |
| Processes [Zombie] | Number of zombie processes | cnt |
| Running Process Usage | Process usage rate | % |
| Running Processes | Number of running processes | cnt |
| Running Thread Usage | Thread usage rate | % |
| Running Threads | Number of running threads | cnt |
| Context Switches | Context switches per second | cnt |
| Load/Core [1 min] | Load per core over 1 minute | cnt |
| Load/Core [15 min] | Load per core over 15 minutes | cnt |
| Load/Core [5 min] | Load per core over 5 minutes | cnt |
| Multipaths [Active] | Number of active multipath connections | cnt |
| Multipaths [Failed] | Number of failed multipath connections | cnt |
| Multipaths [Faulty] | Number of faulty multipath connections | cnt |
| NTP Offset | Measured offset from the NTP server | num |
| Run Queue Length | Run queue length | num |
| Uptime | System uptime in milliseconds | ms |
| Context Switchies | Context switches per second | cnt |
| Disk Read Bytes [Sec] | Bytes read from the device per second | cnt |
| Disk Read Time [Avg] | Average read time from the device | sec |
| Disk Transfer Time [Avg] | Average disk transfer time | sec |
| Disk Usage | Disk usage rate | % |
| Disk Write Bytes [Sec] | Bytes written to the device per second | cnt |
| Disk Write Time [Avg] | Average write time to the device | sec |
| Pagingfile Usage | Paging file usage rate | % |
| Pool Used [Non Paged] | Non-paged pool usage | bytes |
| Pool Used [Paged] | Paged pool usage | bytes |
| Process [Running] | Number of running processes | cnt |
| Threads [Running] | Number of running threads | cnt |
| Threads [Waiting] | Number of waiting threads | cnt |
3 - ServiceWatch Metrics
GPU Server sends metrics to ServiceWatch. The metrics provided by default monitoring are data collected at 5‑minute intervals. If detailed monitoring is enabled, you can view data collected at 1‑minute intervals.
- The basic monitoring and detailed monitoring of the GPU Server are provided with the same metrics as the Virtual Server, and the namespace is also provided as Virtual Server.
- GPU related metrics are provided through ServiceWatch Agent, and for how to collect metrics using ServiceWatch Agent, refer to the ServiceWatch Agent guide.
How to enable detailed monitoring of GPU Server, please refer to How-to guides > ServiceWatch Enable Detailed Monitoring.
Basic Indicators
The following are the basic metrics for the Virtual Server namespace.
| Performance Item | Detailed Description | Unit | Meaningful Statistics |
|---|---|---|---|
| Instance State | Instance State Display | - | - |
| CPU Usage | CPU Usage | % |
|
| Disk Read Bytes | Capacity read from block device (bytes) | Bytes |
|
| Disk Read Requests | Number of read requests on block device | Count |
|
| Disk Write Bytes | Write capacity on block device (bytes) | Bytes |
|
| Disk Write Requests | Number of write requests on block device | Count |
|
| Network In Bytes | Capacity received from network interface (bytes) | Bytes |
|
| Network In Dropped | Number of packet drops received on network interface | Count |
|
| Network In Packets | Number of packets received on the network interface | Count |
|
| Network Out Bytes | Data transmitted from the network interface (bytes) | Bytes |
|
| Network Out Dropped | Number of packet drops transmitted from the network interface | Count |
|
| Network Out Packets | Number of packets transmitted from the network interface | Count |
|