ServiceWatch Metrics
Kubernetes Engine sends metrics to ServiceWatch. The metrics provided by default monitoring are data collected at a 1‑minute interval.
Basic Metrics
The following are the basic metrics for the Kubernetes Engine namespace.
The metrics whose names are displayed in bold below are the metrics selected as key metrics among the default metrics provided by Kubernetes Engine. Key metrics are used to configure service dashboards that are automatically generated for each service in ServiceWatch.
Each metric indicates through the user guide which statistical values are meaningful when viewing that metric, and among the meaningful statistics, the values displayed in bold are the primary statistics. In the service dashboard, you can view key metrics using these primary statistical values.
| Indicator name | Detailed description | unit | meaningful statistics |
|---|---|---|---|
| cluster_up | Cluster up | Count |
|
| cluster_node_count | Cluster node count | Count |
|
| cluster_failed_node_count | Number of failed nodes in the cluster | Count |
|
| cluster_namespace_phase_count | Number of cluster namespace phases | Count |
|
| cluster_pod_phase_count | Number of cluster pod phases | Count |
|
| node_cpu_allocatable | Node CPU allocatable amount | - |
|
| node_cpu_capacity | Node CPU capacity | - |
|
| node_cpu_usage | Node CPU usage | - |
|
| node_cpu_utilization | Node CPU utilization | - |
|
| node_memory_allocatable | Node memory allocatable amount | Bytes |
|
| node_memory_capacity | Node memory capacity | Bytes |
|
| node_memory_usage | Node memory usage | Bytes |
|
| node_memory_utilization | Node memory usage rate | - |
|
| node_network_rx_bytes | Node network received bytes | Bytes/Second |
|
| node_network_tx_bytes | Node network transmitted bytes | Bytes/Second |
|
| node_network_total_bytes | Total bytes of the node network | Bytes/Second |
|
| node_number_of_running_pods | Number of pods running on a node | Count |
|
| namespace_number_of_running_pods | Number of running pods in a namespace | Count |
|
| namespace_deployment_pod_count | Namespace deployment pod count | Count |
|
| namespace_statefulset_pod_count | Namespace StatefulSet pod count | Count |
|
| namespace_daemonset_pod_count | Namespace DaemonSet Pod Count | Count |
|
| namespace_job_active_count | Active namespace job count | Count |
|
| namespace_cronjob_active_count | Number of active namespace cron jobs | Count |
|
| pod_cpu_usage | Pod CPU usage | - |
|
| pod_memory_usage | Pod memory usage | Bytes |
|
| pod_network_rx_bytes | Pod network received bytes | Bytes/Second |
|
| pod_network_tx_bytes | Pod network transmit bytes | Bytes/Second |
|
| pod_network_total_bytes | Pod network total bytes | Count |
|
| container_cpu_usage | Container CPU usage | - |
|
| container_cpu_limit | Container CPU limit | - |
|
| container_cpu_utilization | Container CPU usage | - |
|
| container_memory_usage | Container memory usage | Bytes |
|
| container_memory_limit | Container memory limit | Bytes |
|
| container_memory_utilization | Container memory usage | - |
|
| node_gpu_count | Number of node GPUs | Count |
|
| gpu_temp | GPU temperature | - |
|
| gpu_power_usage | GPU power consumption | - |
|
| gpu_util | GPU utilization | Percent |
|
| gpu_sm_clock | GPU SM clock | - |
|
| gpu_fb_used | GPU FB usage | Megabytes |
|
| gpu_tensor_active | GPU Tensor Utilization | - |
|
| pod_gpu_util | Pod GPU utilization | Percent |
|
| pod_gpu_tensor_active | Pod GPU Tensor Utilization | - |
|