The page has been translated by Gen AI.

Monitoring Metrics

Cloud Monitoring service termination notice

According to Samsung Cloud Platform’s policy, the Cloud Monitoring service is scheduled to be discontinued in September 2026.
Accordingly, from after the September 2026 release, resource monitoring of the Samsung Cloud Platform via Cloud Monitoring will no longer be possible.

With a new alternative service, you can continuously perform resource monitoring by leveraging ServiceWatch released in October 2025.
ServiceWatch provides more modern and powerful features, replacing Cloud Monitoring to deliver a seamless monitoring environment.

If you are collecting metrics and logs through the Cloud Monitoring Agent, you need to switch to the ServiceWatch Agent.

For more details about ServiceWatch, please refer to ServiceWatch Overview.
Detailed information about ServiceWatch Agent: please refer to the ServiceWatch Agent

Multi-node GPU Cluster Monitoring Metrics

The table below shows the monitoring metrics of a Multi-node GPU Cluster that can be viewed through Cloud Monitoring.

Guide
In a Multi-node GPU Cluster, users must install the Agent themselves via the guide to view monitoring metrics. Before using the stable service, please be sure to install the Agent. For instructions on installing the Agent and detailed usage of Cloud Monitoring, refer to the Cloud Monitoring guide.

Multi-node GPU Cluster [Cluster]

Performance itemsDetailed descriptionunit
Memory Total [Basic]bytes of usable memorybytes
Memory Used [Basic]Current memory usage in bytesbytes
Memory Swap In [Basic]bytes of the replaced memorybytes
Memory Swap Out [Basic]bytes of the replaced memorybytes
Memory Free [Basic]bytes of unused memorybytes
Disk Read Bytes [Basic]Read bytesbytes
Disk Read Requests [Basic]Number of read requestscnt
Disk Write Bytes [Basic]write bytesbytes
Disk Write Requests [Basic]Number of write requestscnt
CPU Usage [Basic]Average system CPU usage over 1 minute%
Instance State [Basic]Instance statusstate
Network In Bytes [Basic]Received bytesbytes
Network In Dropped [Basic]Incoming packet dropcnt
Network In Packets [Basic]Number of received packetscnt
Network Out Bytes [Basic]sent bytesbytes
Network Out Dropped [Basic]Transmit packet dropcnt
Network Out Packets [Basic]Number of transmitted packetscnt
Table. Multi-node GPU Cluster [Cluster] Monitoring Metrics (default)
Performance itemsDetailed descriptionunit
Cluster GPU CountGPU Count SUM in Cluster
  • Sum of GPU Count for nodes in the cluster: calculate the total GPU Count of each node within the same GPU CLUSTER
cnt
Cluster GPU Count In UseNumber of GPUs being used by jobs within the cluster
  • Number of GPUs used by processes within the cluster: sum of GPUs occupied by processes, parsed from the ‘Processes:’ section at the bottom of nvidia-smi output of nodes in the same GPU cluster
cnt
Cluster GPU UsageGPU Utilization AVG within the cluster
  • Cluster node GPU utilization average value: calculate the average of each node’s GPU utilization values within the same GPU cluster
%
Cluster GPU Memory Usage [Avg]Cluster GPU Memory Utilization AVG
  • Average Memory utilization of nodes within the cluster: calculate the average of each node’s Memory utilization values among nodes in the same GPU cluster
%
Table. Multi-node GPU Cluster [Cluster] Additional monitoring metrics (Agent installation required)

Multi-node GPU Cluster [Node]

Performance itemsDetailed descriptionunit
Memory Total [Basic]bytes of usable memorybytes
Memory Used [Basic]Current memory usage in bytesbytes
Memory Swap In [Basic]bytes of the replaced memorybytes
Memory Swap Out [Basic]bytes of the replaced memorybytes
Memory Free [Basic]bytes of unused memorybytes
Disk Read Bytes [Basic]Read bytesbytes
Disk Read Requests [Basic]Number of read requestscnt
Disk Write Bytes [Basic]write bytesbytes
Disk Write Requests [Basic]Number of write requestscnt
CPU Usage [Basic]Average system CPU usage over 1 minute%
Instance State [Basic]Instance statusstate
Network In Bytes [Basic]Received bytesbytes
Network In Dropped [Basic]Incoming packet dropcnt
Network In Packets [Basic]Number of received packetscnt
Network Out Bytes [Basic]sent bytesbytes
Network Out Dropped [Basic]Transmit packet dropcnt
Network Out Packets [Basic]Number of transmitted packetscnt
Table. Multi-node GPU Cluster [Node] Monitoring Metrics (provided by default)
Performance itemsDetailed descriptionunit
GPU CountNumber of GPUscnt
GPU TemperatureGPU temperature
GPU Usageutilization%
GPU Usage [Avg]Overall average GPU utilization (%)%
GPU Power CapMaximum power capacity of the GPUW
GPU Power UsageCurrent GPU power usageW
GPU Memory Usage [Avg]GPU Memory Uti. AVG%
GPU Count in useNumber of GPUs in use by jobs on the nodecnt
Execution Status for nvidia-smiResult of running the nvidia-smi commandstatus
Core Usage [IO Wait]Ratio of CPU time spent in wait state (disk wait)%
Core Usage [System]Proportion of CPU time spent in kernel space%
Core Usage [User]Proportion of CPU time spent in user space%
CPU CoresThe number of CPU cores on the host. The maximum value of the unnormalized ratio is 100%* of a core. The unnormalized ratio already incorporates this value, and its maximum is 100%* of a core.cnt
CPU Usage [Active]Percentage of CPU time used excluding Idle and IOWait states (when all 4 cores are used at 100%: 400%)%
CPU Usage [Idle]It is the proportion of CPU time spent in idle state.%
CPU Usage [IO Wait]This is the proportion of CPU time spent in a waiting state (disk wait).%
CPU Usage [System]Percentage of CPU time used by the kernel (when all 4 cores are used at 100%: 400%)%
CPU Usage [User]Percentage of CPU time used in user space. (If all 4 cores are used at 100%, it is 400%)%
CPU Usage/Core [Active]Percentage of CPU time used excluding Idle and IOWait states (value normalized by the number of cores; 100% when all four cores are fully utilized)%
CPU Usage/Core [Idle]It is the proportion of CPU time spent in idle state.%
CPU Usage/Core [IO Wait]This is the proportion of CPU time spent in a waiting state (disk wait).%
CPU Usage/Core [System]Percentage of CPU time used by the kernel (value normalized by the number of cores; 100% when all 4 cores are fully utilized)%
CPU Usage/Core [User]Percentage of CPU time used in user space. (Value normalized by the number of cores; 100% when all 4 cores are fully utilized)%
Disk CPU Usage [IO Request]It is the proportion of CPU time during which I/O requests for the device were executed (device bandwidth utilization). If this value approaches 100%, the device becomes saturated.%
Disk Queue Size [Avg]The average queue length of requests executed for the device.num
Disk Read BytesThe number of bytes read per second from the device.bytes
Disk Read Bytes [Delta Avg]Average of system.diskio.read.bytes_delta for individual disksbytes
Disk Read Bytes [Delta Max]Maximum system.diskio.read.bytes_delta of individual disksbytes
Disk Read Bytes [Delta Min]Minimum system.diskio.read.bytes_delta of individual disksbytes
Disk Read Bytes [Delta Sum]Sum of the system.diskio.read.bytes_delta of individual disksbytes
Disk Read Bytes [Delta]Delta of the system.diskio.read.bytes value for each diskbytes
Disk Read Bytes [Success]Total number of bytes successfully read. On Linux, assuming a sector size of 512, it is the number of sectors read multiplied by 512.bytes
Disk Read RequestsNumber of read requests to the disk device per secondcnt
Disk Read Requests [Delta Avg]Average of the system.diskio.read.count_delta for individual diskscnt
Disk Read Requests [Delta Max]Maximum system.diskio.read.count_delta for individual diskscnt
Disk Read Requests [Delta Min]Minimum of system.diskio.read.count_delta for individual diskscnt
Disk Read Requests [Delta Sum]Sum of system.diskio.read.count_delta of individual diskscnt
Disk Read Requests [Success Delta]Delta of system.diskio.read.count for each diskcnt
Disk Read Requests [Success]Total number of successful readscnt
Disk Request Size [Avg]Average size of requests executed on the device (unit: sectors).num
Disk Service Time [Avg]Average service time (ms) of input requests executed on the device.ms
Disk Wait Time [Avg]Average time taken for requests executed on the supported device.ms
Disk Wait Time [Read]Average disk wait timems
Disk Wait Time [Write]Average disk wait timems
Disk Write Bytes [Delta Avg]Average of system.diskio.write.bytes_delta for each diskbytes
Disk Write Bytes [Delta Max]Maximum system.diskio.write.bytes_delta of individual disksbytes
Disk Write Bytes [Delta Min]Minimum of system.diskio.write.bytes_delta for individual disksbytes
Disk Write Bytes [Delta Sum]Sum of the system.diskio.write.bytes_delta of individual disksbytes
Disk Write Bytes [Delta]Delta of the system.diskio.write.bytes value for each diskbytes
Disk Write Bytes [Success]Total number of bytes successfully written. On Linux, assuming a sector size of 512, it is the number of sectors written multiplied by 512.bytes
Disk Write RequestsNumber of write requests to the disk device per secondcnt
Disk Write Requests [Delta Avg]Average of system.diskio.write.count_delta for individual diskscnt
Disk Write Requests [Delta Max]Maximum system.diskio.write.count_delta for individual diskscnt
Disk Write Requests [Delta Min]Minimum of system.diskio.write.count_delta for individual diskscnt
Disk Write Requests [Delta Sum]Sum of the system.diskio.write.count_delta of individual diskscnt
Disk Write Requests [Success Delta]Delta of system.diskio.write.count for each diskcnt
Disk Write Requests [Success]Total number of successful writescnt
Disk Writes BytesIt is the number of bytes per second written to the device.bytes
Filesystem Hang Checkfilesystem (local/NFS) hang check (normal:1, abnormal:0)status
Filesystem NodesIt is the total number of file nodes in the file system.cnt
Filesystem Nodes [Free]It is the total number of available file nodes in the file system.cnt
Filesystem Size [Available]Disk space (bytes) that unauthorized users can use.bytes
Filesystem Size [Free]Available disk space (bytes)bytes
Filesystem Size [Total]Total disk space (bytes)bytes
Filesystem UsageUsed disk space percentage%
Filesystem Usage [Avg]Average of individual filesystem.used.pct%
Filesystem Usage [Inode]inode usage%
Filesystem Usage [Max]Maximum among individual filesystem.used.pct%
Filesystem Usage [Min]minimum of individual filesystem.used.pct%
Filesystem Usage [Total]-%
Filesystem UsedUsed disk space (bytes)bytes
Filesystem Used [Inode]inode usagebytes
Memory FreeTotal amount of available memory (bytes). Memory used by system cache and buffers is not included (see system.memory.actual.free).bytes
Memory Free [Actual]Actual usable memory (bytes). The calculation method varies by OS: on Linux, it is MemAvailable from /proc/meminfo, or if meminfo cannot be used, it is calculated from available memory plus cache and buffers. On OSX, it is the sum of usable memory and inactive memory. On Windows, it corresponds to a value such as system.memory.free.bytes
Memory Free [Swap]Available swap memory.bytes
Memory Totaltotal memorybytes
Memory Total [Swap]Total swap memory.bytes
Memory UsagePercentage of used memory
  • ((Memory Total - Memory Free) / Memory Total) * 100
  • Memory Free: the amount of available memory currently free
%
Memory Usage [Actual]Percentage of memory actually used
  • ((Memory Total - Mememory Available) / Memory Total) * 100 or ((Memory Total - (Memmory Free + Buffers + Cached) / MemTotal) * 100
  • Memory Free: the amount of free memory currently available
  • Buffers: the amount of memory used for buffers
  • Cached: the amount of memory used for the page cache
%
Memory Usage [Cache Swap]Cached swap usage rate%
Memory Usage [Swap]Percentage of used swap memory%
Memory Usedused memorybytes
Memory Used [Actual]Actual used memory (bytes). The value obtained by subtracting used memory from total memory. Available memory is calculated differently for each OS (see system.actual.free).bytes
Memory Used [Swap]Used swap memory.bytes
CollisionsNetwork collisioncnt
Network In BytesNumber of received bytesbytes
Network In Bytes [Delta Avg]Average of system.network.in.bytes_delta for each networkbytes
Network In Bytes [Delta Max]Maximum of system.network.in.bytes_delta for each networkbytes
Network In Bytes [Delta Min]Minimum system.network.in.bytes_delta for each networkbytes
Network In Bytes [Delta Sum]Sum of system.network.in.bytes_delta for individual networksbytes
Network In Bytes [Delta]Delta of received byte countbytes
Network In DroppedNumber of deleted packets among incoming packetscnt
Network In ErrorsNumber of errors during receptioncnt
Network In PacketsNumber of received packetscnt
Network In Packets [Delta Avg]Average of system.network.in.packets_delta for individual networkscnt
Network In Packets [Delta Max]Maximum of system.network.in.packets_delta for each networkcnt
Network In Packets [Delta Min]Minimum of system.network.in.packets_delta for each networkcnt
Network In Packets [Delta Sum]Sum of system.network.in.packets_delta for individual networkscnt
Network In Packets [Delta]Delta of received packet countcnt
Network Out BytesNumber of transmitted bytesbytes
Network Out Bytes [Delta Avg]Average of system.network.out.bytes_delta for each networkbytes
Network Out Bytes [Delta Max]Maximum system.network.out.bytes_delta of individual networksbytes
Network Out Bytes [Delta Min]Minimum of system.network.out.bytes_delta for individual networksbytes
Network Out Bytes [Delta Sum]Sum of system.network.out.bytes_delta for individual networksbytes
Network Out Bytes [Delta]Delta of transmitted byte countbytes
Network Out DroppedNumber of deleted packets among outgoing packets. This value is not reported by the operating system, so it is always 0 on Darwin and BSD.cnt
Network Out ErrorsNumber of errors during transmissioncnt
Network Out PacketsNumber of transmitted packetscnt
Network Out Packets [Delta Avg]Average of system.network.out.packets_delta for each networkcnt
Network Out Packets [Delta Max]Maximum of system.network.out.packets_delta for each networkcnt
Network Out Packets [Delta Min]Minimum of system.network.out.packets_delta for each networkcnt
Network Out Packets [Delta Sum]Sum of system.network.out.packets_delta for individual networkscnt
Network Out Packets [Delta]Delta of transmitted packet countcnt
Open Connections [TCP]All open TCP connectionscnt
Open Connections [UDP]All open UDP connectionscnt
Port UsageAvailable port usage rate%
SYN Sent SocketsNumber of sockets in SYN_SENT state (when connecting from local to remote)cnt
Kernel PID Maxkernel.pid_max valuecnt
Kernel Thread Maxkernel.threads-max valuecnt
Process CPU UsagePercentage of CPU time consumed by the process since the last update. This value is similar to the %CPU value shown for the process by the top command on Unix systems.%
Process CPU Usage/CoreThe percentage of CPU time used by the process since the last event. Normalized by the number of cores, with values ranging from 0 to 100%.%
Process Memory UsageProportion of main memory (RAM) occupied by a process%
Process Memory UsedResident Set size. The amount of memory a process occupies in RAM. In Windows, the current working set size.bytes
Process PIDprocess pidPID
Process PPIDparent process PIDPID
Processes [Dead]Number of dead processescnt
Processes [Idle]Number of idle processescnt
Processes [Running]Number of running processescnt
Processes [Sleeping]Number of sleeping processescnt
Processes [Stopped]stopped processes countcnt
Processes [Total]Total number of processescnt
Processes [Unknown]Number of processes with an unknown or unsearchable statuscnt
Processes [Zombie]Number of zombie processescnt
Running Process Usageprocess usage%
Running ProcessesNumber of running processescnt
Running Thread UsageThread usage rate%
Running ThreadsTotal number of threads running in running processescnt
Instance StatusInstance statusstate
Context Switchescontext switch count (per second)cnt
Load/Core [1 min]The load over the last 1 minute divided by the number of corescnt
Load/Core [15 min]The load over the last 15 minutes divided by the number of corescnt
Load/Core [5 min]The load over the last 5 minutes divided by the number of corescnt
Multipaths [Active]External storage connection path status = active countcnt
Multipaths [Failed]External storage connection path status = failed countcnt
Multipaths [Faulty]External storage connection path status = faulty countcnt
NTP Offsetmeasured offset of the last sample (the time difference between the NTP server and the local environment)num
Run Queue LengthExecution queue lengthnum
UptimeOS uptime(uptime). (milliseconds)ms
Context SwitchiesCPU context switch count (per second)cnt
Disk Read Bytes [Sec]Number of bytes read from a Windows logical disk in 1 secondcnt
Disk Read Time [Avg]Average data read time (seconds)sec
Disk Transfer Time [Avg]Disk average wait timesec
Disk UsageDisk usage%
Disk Write Bytes [Sec]Number of bytes written in one second on a Windows logical diskcnt
Disk Write Time [Avg]Average data write time (seconds)sec
Pagingfile UsagePaging file usage%
Pool Used [Non Paged]Nonpaged Pool usage in kernel memorybytes
Pool Used [Paged]Paged Pool usage in kernel memorybytes
Process [Running]Number of currently running processescnt
Threads [Running]Number of currently running threadscnt
Threads [Waiting]Number of threads waiting for processor timecnt
Table. Multi-node GPU Cluster [Node] Additional monitoring metrics (Agent installation required)
Server type
How-to guides