Overview

1: ServiceWatch metric

Service Overview

Data Ops is a managed workflow orchestration service based on Apache Airflow that creates workflows for data processing tasks that occur periodically or repeatedly and automates job scheduling. Users can automate the process of delivering useful data to the right place at the required time and monitor the configuration and progress of data pipelines.

Provided features

Data Ops provides the following features.

Convenient Installation and Management: Data Ops can be easily installed through a web-based Console in a standard Kubernetes cluster environment. * Apache Airflow and the management module are installed automatically, and integrated monitoring of the web server and scheduler execution status is available through the unified dashboard.
Dynamic pipeline configuration: Pipeline configuration for data tasks is possible based on Python code. * Since tasks are generated dynamically in conjunction with data job scheduling, you can freely configure the desired workflow structure and scheduling.
Convenient workflow management: DAG (Direct Acyclic Graph: directed acyclic graph) configuration is visualized and managed through a web-based UI, allowing easy understanding of the sequence and parallel relationships of data flow. * You can also easily manage each task’s timeout, retry count, and priority definitions.

Component

Data Ops is composed of Manager and Service modules, and provides Apache Airflow as a packaged solution.

Data Ops Manager

Data Ops Manager provides various managing features to enable more efficient use of Airflow.

You can upload Plugin Files, Shared Files, and Python Library Files for use in Ops Service through Ops Manager.
You can easily provision configuration information for Airflow components within the cluster.
You can manage other service configuration information within the Airflow cluster and provision it easily.

Data Ops Service

We provide a managed workflow orchestration service based on Apache Airflow.
When providing Airflow, you can set Description, required resource size, DAGs GitSync, and Host Alias.
After creating the service, you can modify the Description, resource size used, DAGs GitSync, and Host Alias to reflect changes to the service.

Server spec type

When creating a Data Ops service, check the following.

Recommended Service Installation Specifications: CPU KubernetesExecutor 43 cores, CPU CeleryExecutor 25 cores, Memory 50 GB, storage at least 100 GB

Provision status by region

Data Ops is available in the environments below.

Region	Provision status
Korea West (kr-west1)	Provide
Korea East (kr-east1)	Provide
South Korea South 1 (kr-south1)	Not provided
South Korea 2 (kr-south2)	Not provided
South Korea South 3 (kr-south3)	Not provided

Table. Data Ops regional availability status

Preliminary Service

This is a list of services that must be pre‑configured before creating the service. Please refer to the guide provided for each service and prepare in advance.

Service Category	Service	Details
Storage	File Storage	Storage where multiple client servers share files via a network connection
Container	Kubernetes Engine	Kubernetes container orchestration service

Table. Data Ops pre-service

1 - ServiceWatch metric

In ServiceWatch, you can view Kubernetes Engine metrics for the Kubernetes Engine created by Data Ops. As with Kubernetes Engine, the metrics provided by default monitoring are data collected at one‑minute intervals.

Reference

Refer to the ServiceWatch guide for checking metrics in ServiceWatch.

Basic Metrics

The following are the default metrics for the Kubernetes Engine namespace.

The metrics whose names are displayed in bold below are the key metrics selected from the default metrics provided by Kubernetes Engine. Key metrics are used to build service dashboards that are automatically created for each service in ServiceWatch.

Each metric provides guidance in the user guide on which statistical values are meaningful when querying that metric, and among the meaningful statistics, the values shown in bold are the primary statistics. In the service dashboard, you can view key metrics using primary statistical values.

Indicator name	Detailed description	unit	meaningful statistics
cluster_up	Cluster up	Count	Total Average Maximum Minimum
cluster_node_count	Number of cluster nodes	Count	Total Average Maximum Minimum
cluster_failed_node_count	Number of failed nodes in the cluster	Count	Total Average Maximum Minimum
cluster_namespace_phase_count	Number of cluster namespace phases	Count	Total Average Maximum Minimum
cluster_pod_phase_count	Cluster pod phase count	Count	Total Average Maximum Minimum
node_cpu_allocatable	Node CPU allocatable amount	-	Total Average Maximum Minimum
node_cpu_capacity	Node CPU capacity	-	Total Average Maximum Minimum
node_cpu_usage	Node CPU usage	-	Total Average Maximum Minimum
node_cpu_utilization	Node CPU usage	-	Total Average Maximum Minimum
node_memory_allocatable	Node memory allocatable amount	Bytes	Total Average Maximum Minimum
node_memory_capacity	Node memory capacity	Bytes	Total Average Maximum Minimum
node_memory_usage	Node memory usage	Bytes	Total Average Maximum Minimum
node_memory_utilization	Node memory utilization	-	Total Average Maximum Minimum
node_network_rx_bytes	Node network received bytes	Bytes/Second	Total Average Maximum Minimum
node_network_tx_bytes	Node network transmitted bytes	Bytes/Second	Total Average Maximum Minimum
node_network_total_bytes	Total bytes of the node network	Bytes/Second	Total Average Maximum Minimum
node_number_of_running_pods	Number of pods running on the node	Count	Total Average Maximum Minimum
namespace_number_of_running_pods	Number of running pods in the namespace	Count	Total Average Maximum Minimum
namespace_deployment_pod_count	Namespace deployment pod count	Count	Total Average Maximum Minimum
namespace_statefulset_pod_count	Namespace StatefulSet pod count	Count	Total Average Maximum Minimum
namespace_daemonset_pod_count	Namespace daemonset pod count	Count	Total Average Maximum Minimum
namespace_job_active_count	Active namespace job count	Count	Total Average Maximum Minimum
namespace_cronjob_active_count	Number of active namespace cronjobs	Count	Total Average Maximum Minimum
pod_cpu_usage	Pod CPU usage	-	Total Average Maximum Minimum
pod_memory_usage	Pod memory usage	Bytes	Total Average Maximum Minimum
pod_network_rx_bytes	Pod network received bytes	Bytes/Second	Total Average Maximum Minimum
pod_network_tx_bytes	Pod network transmitted bytes	Bytes/Second	Total Average Maximum Minimum
pod_network_total_bytes	Pod network total bytes	Count	Total Average Maximum Minimum
container_cpu_usage	Container CPU usage	-	Total Average Maximum Minimum
container_cpu_limit	Container CPU limit	-	Total Average Maximum Minimum
container_cpu_utilization	Container CPU usage	-	Total Average Maximum Minimum
container_memory_usage	Container memory usage	Bytes	Total Average Maximum Minimum
container_memory_limit	Container memory limit	Bytes	Total Average Maximum Minimum
container_memory_utilization	Container memory usage	-	Total Average Maximum Minimum
node_gpu_count	Node GPU count	Count	Total Average Maximum Minimum
gpu_temp	GPU temperature	-	Total Average Maximum Minimum
gpu_power_usage	GPU power usage	-	Total Average Maximum Minimum
gpu_util	GPU utilization	Percent	Total Average Maximum Minimum
gpu_sm_clock	GPU SM clock	-	Total Average Maximum Minimum
gpu_fb_used	GPU FB usage	Megabytes	Total Average Maximum Minimum
gpu_tensor_active	GPU Tensor Utilization	-	Total Average Maximum Minimum
pod_gpu_util	Pod GPU utilization	Percent	Total Average Maximum Minimum
pod_gpu_tensor_active	Pod GPU Tensor Utilization Rate	-	Total Average Maximum Minimum

Table. Kubernetes Engine Basic Metrics