The page has been translated by Gen AI.

Monitoring Metrics

GPU Server Monitoring Metrics

The following table shows the monitoring metrics of the GPU Server that can be checked through Cloud Monitoring.

Even without installing an Agent, basic monitoring metrics are provided. Please check the Table. GPU Server Monitoring Metrics (Basic) below. Additionally, metrics that can be retrieved by installing an Agent are referenced in the Table. GPU Server Additional Monitoring Metrics (Agent Installation Required) below.

For detailed Cloud Monitoring usage, please refer to the Cloud Monitoring guide.

Performance Item NameDescriptionUnit
Memory Total [Basic]Total available memory in bytesbytes
Memory Used [Basic]Currently used memory in bytesbytes
Memory Swap In [Basic]Swapped memory in bytesbytes
Memory Swap Out [Basic]Swapped memory in bytesbytes
Memory Free [Basic]Unused memory in bytesbytes
Disk Read Bytes [Basic]Read bytesbytes
Disk Read Requests [Basic]Number of read requestscnt
Disk Write Bytes [Basic]Written bytesbytes
Disk Write Requests [Basic]Number of write requestscnt
CPU Usage [Basic]Average system CPU usage over 1 minute%
Instance State [Basic]Instance statestate
Network In Bytes [Basic]Received bytesbytes
Network In Dropped [Basic]Dropped received packetscnt
Network In Packets [Basic]Received packetscnt
Network Out Bytes [Basic]Sent bytesbytes
Network Out Dropped [Basic]Dropped sent packetscnt
Network Out Packets [Basic]Sent packetscnt
Table. GPU Server Basic Monitoring Metrics (Basic)
Performance Item NameDescriptionUnit
GPU CountNumber of GPUscnt
GPU Memory UsageGPU memory usage rate%
GPU Memory UsedUsed GPU memoryMB
GPU TemperatureGPU temperature
GPU UsageGPU utilization%
GPU Usage [Avg]Average GPU usage rate%
GPU Power CapMaximum power capacity of the GPUW
GPU Power UsageCurrent power usage of the GPUW
GPU Memory Usage [Avg]Average GPU memory usage rate%
GPU Count in useNumber of GPUs in use by jobs on the nodecnt
Execution Status for nvidia-smiExecution result of the nvidia-smi commandstatus
Core Usage [IO Wait]CPU time spent in IO wait state%
Core Usage [System]CPU time spent in system space%
Core Usage [User]CPU time spent in user space%
CPU CoresNumber of CPU cores on the hostcnt
CPU Usage [Active]CPU time used, excluding idle and IO wait states%
CPU Usage [Idle]CPU time spent in idle state%
CPU Usage [IO Wait]CPU time spent in IO wait state%
CPU Usage [System]CPU time used by the kernel%
CPU Usage [User]CPU time used by user space%
CPU Usage/Core [Active]CPU time used per core, excluding idle and IO wait states%
CPU Usage/Core [Idle]CPU time spent in idle state per core%
CPU Usage/Core [IO Wait]CPU time spent in IO wait state per core%
CPU Usage/Core [System]CPU time used by the kernel per core%
CPU Usage/Core [User]CPU time used by user space per core%
Disk CPU Usage [IO Request]CPU time spent on IO requests%
Disk Queue Size [Avg]Average queue length of requestsnum
Disk Read BytesBytes read from the device per secondbytes
Disk Read Bytes [Delta Avg]Average delta of bytes read from the devicebytes
Disk Read Bytes [Delta Max]Maximum delta of bytes read from the devicebytes
Disk Read Bytes [Delta Min]Minimum delta of bytes read from the devicebytes
Disk Read Bytes [Delta Sum]Sum of delta of bytes read from the devicebytes
Disk Read Bytes [Delta]Delta of bytes read from the devicebytes
Disk Read Bytes [Success]Total bytes successfully readbytes
Disk Read RequestsNumber of read requests to the device per secondcnt
Disk Read Requests [Delta Avg]Average delta of read requests to the devicecnt
Disk Read Requests [Delta Max]Maximum delta of read requests to the devicecnt
Disk Read Requests [Delta Min]Minimum delta of read requests to the devicecnt
Disk Read Requests [Delta Sum]Sum of delta of read requests to the devicecnt
Disk Read Requests [Success Delta]Delta of successful read requests to the devicecnt
Disk Read Requests [Success]Total successful read requestscnt
Disk Request Size [Avg]Average size of requests to the devicenum
Disk Service Time [Avg]Average service time of requests to the devicems
Disk Wait Time [Avg]Average wait time of requests to the devicems
Disk Wait Time [Read]Average read wait time of the devicems
Disk Wait Time [Write]Average write wait time of the devicems
Disk Write Bytes [Delta Avg]Average delta of bytes written to the devicebytes
Disk Write Bytes [Delta Max]Maximum delta of bytes written to the devicebytes
Disk Write Bytes [Delta Min]Minimum delta of bytes written to the devicebytes
Disk Write Bytes [Delta Sum]Sum of delta of bytes written to the devicebytes
Disk Write Bytes [Delta]Delta of bytes written to the devicebytes
Disk Write Bytes [Success]Total bytes successfully writtenbytes
Disk Write RequestsNumber of write requests to the device per secondcnt
Disk Write Requests [Delta Avg]Average delta of write requests to the devicecnt
Disk Write Requests [Delta Max]Maximum delta of write requests to the devicecnt
Disk Write Requests [Delta Min]Minimum delta of write requests to the devicecnt
Disk Write Requests [Delta Sum]Sum of delta of write requests to the devicecnt
Disk Write Requests [Success Delta]Delta of successful write requests to the devicecnt
Disk Write Requests [Success]Total successful write requestscnt
Disk Writes BytesBytes written to the device per secondbytes
Filesystem Hang CheckFilesystem hang check (normal: 1, abnormal: 0)status
Filesystem NodesTotal number of filesystem nodescnt
Filesystem Nodes [Free]Total number of available filesystem nodescnt
Filesystem Size [Available]Available disk space in bytesbytes
Filesystem Size [Free]Free disk space in bytesbytes
Filesystem Size [Total]Total disk space in bytesbytes
Filesystem UsageDisk space usage rate%
Filesystem Usage [Avg]Average disk space usage rate%
Filesystem Usage [Inode]Inode usage rate%
Filesystem Usage [Max]Maximum disk space usage rate%
Filesystem Usage [Min]Minimum disk space usage rate%
Filesystem Usage [Total]Total disk space usage rate%
Filesystem UsedUsed disk space in bytesbytes
Filesystem Used [Inode]Used inode space in bytesbytes
Memory FreeTotal available memory in bytesbytes
Memory Free [Actual]Actual available memory in bytesbytes
Memory Free [Swap]Available swap memory in bytesbytes
Memory TotalTotal memory in bytesbytes
Memory Total [Swap]Total swap memory in bytesbytes
Memory UsageMemory usage rate%
Memory Usage [Actual]Actual memory usage rate%
Memory Usage [Cache Swap]Cache swap usage rate%
Memory Usage [Swap]Swap memory usage rate%
Memory UsedUsed memory in bytesbytes
Memory Used [Actual]Actual used memory in bytesbytes
Memory Used [Swap]Used swap memory in bytesbytes
CollisionsNetwork collisionscnt
Network In BytesReceived bytesbytes
Network In Bytes [Delta Avg]Average delta of received bytesbytes
Network In Bytes [Delta Max]Maximum delta of received bytesbytes
Network In Bytes [Delta Min]Minimum delta of received bytesbytes
Network In Bytes [Delta Sum]Sum of delta of received bytesbytes
Network In Bytes [Delta]Delta of received bytesbytes
Network In DroppedDropped received packetscnt
Network In ErrorsReceived errorscnt
Network In PacketsReceived packetscnt
Network In Packets [Delta Avg]Average delta of received packetscnt
Network In Packets [Delta Max]Maximum delta of received packetscnt
Network In Packets [Delta Min]Minimum delta of received packetscnt
Network In Packets [Delta Sum]Sum of delta of received packetscnt
Network In Packets [Delta]Delta of received packetscnt
Network Out BytesSent bytesbytes
Network Out Bytes [Delta Avg]Average delta of sent bytesbytes
Network Out Bytes [Delta Max]Maximum delta of sent bytesbytes
Network Out Bytes [Delta Min]Minimum delta of sent bytesbytes
Network Out Bytes [Delta Sum]Sum of delta of sent bytesbytes
Network Out Bytes [Delta]Delta of sent bytesbytes
Network Out DroppedDropped sent packetscnt
Network Out ErrorsSent errorscnt
Network Out PacketsSent packetscnt
Network Out Packets [Delta Avg]Average delta of sent packetscnt
Network Out Packets [Delta Max]Maximum delta of sent packetscnt
Network Out Packets [Delta Min]Minimum delta of sent packetscnt
Network Out Packets [Delta Sum]Sum of delta of sent packetscnt
Network Out Packets [Delta]Delta of sent packetscnt
Open Connections [TCP]Open TCP connectionscnt
Open Connections [UDP]Open UDP connectionscnt
Port UsagePort usage rate%
SYN Sent SocketsNumber of sockets in SYN_SENT statecnt
Kernel PID MaxMaximum PID valuecnt
Kernel Thread MaxMaximum thread valuecnt
Process CPU UsageCPU time used by the process%
Process CPU Usage/CoreCPU time used by the process per core%
Process Memory UsageResident Set size%
Process Memory UsedUsed memory by the processbytes
Process PIDProcess IDPID
Process PPIDParent process IDPID
Processes [Dead]Number of dead processescnt
Processes [Idle]Number of idle processescnt
Processes [Running]Number of running processescnt
Processes [Sleeping]Number of sleeping processescnt
Processes [Stopped]Number of stopped processescnt
Processes [Total]Total number of processescnt
Processes [Unknown]Number of unknown processescnt
Processes [Zombie]Number of zombie processescnt
Running Process UsageProcess usage rate%
Running ProcessesNumber of running processescnt
Running Thread UsageThread usage rate%
Running ThreadsNumber of running threadscnt
Context SwitchesContext switches per secondcnt
Load/Core [1 min]Load per core over 1 minutecnt
Load/Core [15 min]Load per core over 15 minutescnt
Load/Core [5 min]Load per core over 5 minutescnt
Multipaths [Active]Number of active multipath connectionscnt
Multipaths [Failed]Number of failed multipath connectionscnt
Multipaths [Faulty]Number of faulty multipath connectionscnt
NTP OffsetMeasured offset from the NTP servernum
Run Queue LengthRun queue lengthnum
UptimeSystem uptime in millisecondsms
Context SwitchiesContext switches per secondcnt
Disk Read Bytes [Sec]Bytes read from the device per secondcnt
Disk Read Time [Avg]Average read time from the devicesec
Disk Transfer Time [Avg]Average disk transfer timesec
Disk UsageDisk usage rate%
Disk Write Bytes [Sec]Bytes written to the device per secondcnt
Disk Write Time [Avg]Average write time to the devicesec
Pagingfile UsagePaging file usage rate%
Pool Used [Non Paged]Non-paged pool usagebytes
Pool Used [Paged]Paged pool usagebytes
Process [Running]Number of running processescnt
Threads [Running]Number of running threadscnt
Threads [Waiting]Number of waiting threadscnt
Table. GPU Server Additional Monitoring Metrics (Agent Installation Required)
Server Type
ServiceWatch Metrics