Overview

1: Monitoring Metrics
2: ServiceWatch metric

Service Overview

Kubernetes Engine is a service that provides lightweight virtual computing and containers, as well as a Kubernetes cluster to manage them. Users can utilize the Kubernetes environment without complex preparation by installing, operating, and maintaining the Kubernetes Control Plane.

Features

Standard Kubernetes Environment Configuration: The standard Kubernetes environment can be used without separate configuration through the default Kubernetes Control Plane provided. It is compatible with applications in other standard Kubernetes environments, so you can use standard Kubernetes applications without modifying the code.
Easy Kubernetes Deployment: Provides secure communication between worker nodes and managed control planes, and quickly provisions worker nodes, allowing users to focus on building applications on the provided container environment.
Convenient Kubernetes Management: Provides various management features to conveniently use the created Kubernetes cluster, such as cluster information inquiry and cluster management, namespace management, and workload management through the dashboard for enterprise environments.

Service Composition Diagram

Figure. K8s Engine Configuration Diagram

Provided Features

Kubernetes Engine provides the following features.

Cluster Management: You can create and manage clusters to use the Kubernetes Engine service. After creating a cluster, you can add services necessary for operation, such as nodes, namespaces, and workloads.
Node Management: A node is a set of machines that run containerized applications. Every cluster must have at least one worker node to deploy applications. Nodes can be defined and used by defining a node pool. Nodes belonging to a node pool must have the same server type, size, and OS image, and multiple node pools can be created to establish a flexible deployment strategy.
Namespace Management: Namespace is a logical separation unit within a Kubernetes cluster, and is used to specify access permissions or resource usage limits by namespace.
Workload Management: Workload is an application running on Kubernetes Engine. You can create a namespace, then add or delete workloads. Workloads are created and managed item by item, such as deployments, pods, stateful sets, daemon sets, jobs, and cron jobs.
Service and Ingress Management: Service is an abstraction method that exposes applications running in a set of pods as a network service, and Ingress is used to expose HTTP and HTTPS paths from outside the cluster to the inside. After creating a namespace, you can create or delete services, endpoints, ingresses, and ingress classes.
Storage Management: When using Kubernetes Engine, you can create and manage the storage to be used. Storage is created and managed by items such as PVC, PV, and storage class.
Configuration Management: When there is a need to manage values that change inside a container according to multiple environments such as Dev/Prod, managing them with separate images due to environment variables is inconvenient and causes significant cost waste. In Kubernetes, you can manage environment variables or configuration values as variables from the outside so that they can be inserted when a Pod is created, and at this time, ConfigMap and Secret can be used.
Access Control: In cases where multiple users access a Kubernetes cluster, you can grant permissions for specific APIs or namespaces to restrict access. You can apply Kubernetes’ role-based access control (RBAC) feature to set permissions for clusters or namespaces. You can create and manage cluster roles, cluster role bindings, roles, and role bindings.

Component

Control Plane

The Control Plane is the master node role in the Kubernetes Engine service. The master node is the management node of the cluster, and it plays a role in managing other nodes in the cluster. The cluster is the basic creation unit of the Kubernetes Engine service, and it is used to manage node pools, objects, controllers, and other components within it. Users set up the cluster name, control plane, network, File Storage, and other settings, and then create a node pool within the cluster to use it. The master node assigns tasks to the cluster, monitors the status of the nodes, and plays a role in data communication between nodes.

The cluster name creation rule is as follows.

It starts with English and can be set within 3-30 characters using English, numbers, and special characters (-).
The cluster name must not be duplicated with the existing one.

Worker Node

The Worker Node is a work node in the cluster, playing a role in performing the cluster’s tasks. The Worker Node receives tasks from the cluster’s master node, performs them, and reports the task results to the cluster’s master node. All nodes created within the node pool and namespace play the role of a worker node.

The creation rule of the node pool, which is a collection of worker nodes, is as follows.

A node pool must have at least one node to be created for application deployment to be possible.
Up to 100 nodes can be created in a node pool.
Since the maximum number of nodes is 100, if there are 100 node pools, 1 node per node pool, and if there are 50 node pools, 2 nodes per node pool, the total number of nodes can be created freely within 100 nodes.
It is possible to set up Block Storage connected to the node pool.
It is possible to set the server type, size, and OS image for nodes belonging to the node pool, and all must be the same.
Auto-Scaling service allows you to set automatic node pool expansion/reduction according to the requirements of the deployed application.

Preceding Service

This is a list of services that must be pre-configured before creating this service. Please refer to the guide provided for each service and prepare in advance for more details.

Service Category	Service	Detailed Description
Networking	VPC	A service that provides an independent virtual network in a cloud environment
Networking	Security Group	A virtual firewall that controls the server’s traffic
Storage	File Storage	A storage that allows multiple clients to share files over the network Used as a Persistant Volume

Fig. Preceding services of Kubernetes Engine

1 - Monitoring Metrics

Kubernetes Engine Monitoring Metrics

The following table shows the monitoring metrics of Kubernetes Engine that can be checked through Cloud Monitoring. For detailed instructions on using Cloud Monitoring, refer to the Cloud Monitoring guide.

Performance Item	Detailed Description	Unit
Cluster Namespaces [Active]	Number of active namespaces	cnt
Cluster Namespaces [Total]	Total number of namespaces in the cluster	cnt
Cluster Nodes [Ready]	Number of nodes in READY state	cnt
Cluster Nodes [Total]	Total number of nodes in the cluster	cnt
Cluster Pods [Failed]	Number of failed pods in the cluster	cnt
Cluster Pods [Pending]	Number of pending pods in the cluster	cnt
Cluster Pods [Running]	Number of running pods in the cluster	cnt
Cluster Pods [Succeeded]	Number of succeeded pods in the cluster	cnt
Cluster Pods [Unknown]	Number of unknown pods in the cluster	cnt
Instance Status	Cluster status	status
Namespace Pods [Failed]	Number of failed pods in the namespace	cnt
Namespace Pods [Pending]	Number of pending pods in the namespace	cnt
Namespace Pods [Running]	Number of running pods in the namespace	cnt
Namespace Pods [Succeeded]	Number of succeeded pods in the namespace	cnt
Namespace Pods [Unknown]	Number of unknown pods in the namespace	cnt
Namespace GPU Clock Frequency	SM clock frequency in the namespace	MHz
Namespace GPU Memory Usage	Memory utilization in the namespace	%
Namespace GPU Usage	GPU utilization in the namespace	%
Node CPU Size [Allocatable]	Allocatable CPU in the node	cnt
Node CPU Size [Capacity]	CPU capacity in the node	cnt
Node CPU Usage	CPU usage in the node	%
Node CPU Usage [Request]	CPU request ratio in the node	%
Node CPU Used	CPU utilization in the node	status
Node Filesystem Usage	Filesystem usage in the node	%
Node Memory Size [Allocatable]	Allocatable memory in the node	bytes
Node Memory Size [Capacity]	Memory capacity in the node	bytes
Node Memory Usage	Memory utilization in the node	%
Node Memory Usage [Request]	Memory request ratio in the node	%
Node Memory Workingset	Memory working set in the node	bytes
Node Network In Bytes	Node network received bytes	bytes
Node Network Out Bytes	Node network transmitted bytes	bytes
Node Network Total Bytes	Node network total bytes	bytes
Node Pods [Failed]	Number of failed pods in the node	cnt
Node Pods [Pending]	Number of pending pods in the node	cnt
Node Pods [Running]	Number of running pods in the node	cnt
Node Pods [Succeeded]	Number of succeeded pods in the node	cnt
Node Pods [Unknown]	Number of unknown pods in the node	cnt
Pod CPU Usage [Limit]	CPU usage limit ratio in the pod	%
Pod CPU Usage [Request]	CPU request ratio in the pod	%
Pod CPU Usage	CPU usage in the pod	%
Pod GPU Clock Frequency	SM clock frequency in the pod	MHz
Pod GPU Memory Usage	Memory utilization in the pod	%
Pod GPU Usage	GPU utilization in the pod	%
Pod Memory Usage [Limit]	Memory usage limit ratio in the pod	%
Pod Memory Usage [Request]	Memory request ratio in the pod	%
Pod Memory Usage	Memory usage in the pod	bytes
Pod Network In Bytes	Pod network received bytes	bytes
Pod Network Out Bytes	Pod network transmitted bytes	bytes
Pod Network Total Bytes	Pod network total bytes	bytes
Pod Restart Containers	Container restart count in the pod	cnt
Workload Pods [Running]	-	cnt

Table. Kubernetes Engine Monitoring Metrics

2 - ServiceWatch metric

Kubernetes Engine sends metrics to ServiceWatch. The metrics provided by default monitoring are data collected at a 1‑minute interval.

Reference

To check metrics in ServiceWatch, refer to the ServiceWatch guide.

Basic Indicators

The following are the basic metrics for the namespace Kubernetes Engine.

Indicator name	Detailed description	Unit	Meaningful statistics
cluster_up	Cluster up	Count	Total
cluster_node_count	Cluster node count	Count	Sum
cluster_failed_node_count	Cluster failed node count	Count	Total
cluster_namespace_phase_count	Cluster Namespace Phase Count	Count	Total
cluster_pod_phase_count	Cluster pod phase count	Count	Total
node_cpu_allocatable	Node CPU allocatable	-	Total
node_cpu_capacity	Node CPU capacity	-	Total
node_cpu_usage	Node CPU usage	-	Total
node_cpu_utilization	Node CPU Utilization	-	Total
node_memory_allocatable	Node memory allocatable	Bytes	Total
node_memory_capacity	Node memory capacity	Bytes	Total
node_memory_usage	Node memory usage	Bytes	Total
node_memory_utilization	Node Memory Utilization	-	Total
node_network_rx_bytes	Node Network Receive Bytes	Bytes/Second	Total
node_network_tx_bytes	Node network transmission bytes	Bytes/Second	Total
node_network_total_bytes	Node Network Total Bytes	Bytes/Second	Total
node_number_of_running_pods	Node Running Pod Count	Count	Total
namespace_number_of_running_pods	Namespace running pod count	Count	Total
namespace_deployment_pod_count	Namespace deployment pod count	Count	Total
namespace_statefulset_pod_count	Namespace StatefulSet Pod Count	Count	Total
namespace_daemonset_pod_count	Namespace DaemonSet Pod Count	Count	Total
namespace_job_active_count	Namespace job active count	Count	Total
namespace_cronjob_active_count	Namespace CronJob Active Count	Count	Total
pod_cpu_usage	Pod CPU usage	-	Total
pod_memory_usage	Pod memory usage	Bytes	Total
pod_network_rx_bytes	Pod network receive bytes	Bytes/Second	Total
pod_network_tx_bytes	Pod network transmission bytes	Bytes/Second	Total
pod_network_total_bytes	Pod network total bytes	Count	Total
container_cpu_usage	Container CPU usage	-	Total
container_cpu_limit	Container CPU limit	-	Total
container_cpu_utilization	Container CPU Utilization	-	Total
container_memory_usage	container memory usage	Bytes	total
container_memory_limit	container memory limit	Bytes	Total
container_memory_utilization	container memory utilization	-	Total
node_gpu_count	Node GPU count	Count	Total
gpu_temp	GPU Temperature	-	Total
gpu_power_usage	GPU power usage	-	Total
gpu_util	GPU utilization	Percent	Total
gpu_sm_clock	GPU SM Clock	-	Total
gpu_fb_used	GPU FB usage	Megabytes	Total
gpu_tensor_active	GPU Tensor Activation Rate	-	Total
pod_gpu_util	Pod GPU Usage Rate	Percent	Total
pod_gpu_tensor_active	Pod GPU Tensor Activation Rate	-	Total

Table. Kubernetes Engine Basic Metrics