The page has been translated by Gen AI.

Overview

Service Overview

Data Ops is a managed workflow orchestration service based on Apache Airflow that writes workflows for periodic or repetitive data processing tasks and automates task scheduling. Users can automate the process of bringing useful data to the right place at the right time, and monitor the configuration and progress of data pipelines.

Architecture Diagram
Figure. Data Ops Architecture Diagram

Provided Features

Data Ops provides the following functions.

  • Easy installation and management: Data Ops can be easily installed through a web-based Console in a standard Kubernetes cluster environment. Apache Airflow and management modules are automatically installed, and integrated monitoring of the execution status of web servers and schedulers is possible through an integrated dashboard.
  • Dynamic Pipeline Composition: Pipeline composition for data tasks is possible based on Python code. Since it dynamically generates tasks in conjunction with data task scheduling, you can freely compose the desired workflow form and scheduling.
  • Convenient workflow management: DAG (Direct Acyclic Graph: directed acyclic graph) configuration is visualized and managed through a web-based UI, making it easy to understand the data flow’s preceding and parallel relationships. Additionally, each task’s timeout, retry count, priority definition, etc. can be easily managed.

Component

Data Ops consists of Manager and Service modules, and provides Apache Airflow by packaging it.

Data Ops Manager

Data Ops Manager provides various managing functions to use Airflow more efficiently.

  • You can upload Plugin File, Shared File, Python Library File to be used in Ops Service through Ops Manager.
  • You can easily provision setting information for Airflow configuration components within the cluster.
  • You can manage and easily provision different service settings within the Airflow cluster.

Data Ops Service

  • Provides a managed workflow orchestration service based on Apache Airflow.
  • When Airflow is provided, you can set Description, necessary resource size, DAGs GitSync, and Host Alias.
  • After creating a service, you can modify Description, resource usage, DAGs GitSync, and Host Alias to reflect the service.

Server Spec Type

When creating a Data Ops service, please check the following contents.

  • Recommended Service Installation Specifications: CPU KubernetesExecutor 43 core, CPU CeleryExecutor 25 core, Memory 50 GB, storage 100 GB or more
Note
  • It is necessary to install Ingress Controller before creating Data Ops service.
  • In a Kubernetes cluster, only 1 Ingress Controller can be installed.
  • For more detailed information, please refer to Ingress Controller installation.

Regional Provision Status

Data Ops is available in the following environments.

RegionAvailability
Western Korea(kr-west1)Provided
Korea East(kr-east1)Not provided
South Korea (kr-south1)Provided
South Korea Central(kr-central)Available
South Korea southern region 3(kr-south3)Provided
Table. Data Ops Regional Provision Status

Preceding Service

This is a list of services that must be pre-configured before creating this service. Please refer to the guide provided for each service and prepare in advance.

Service CategoryServiceDetailed Description
StorageFile StorageStorage that allows multiple client servers to share files through network connections
ContainerKubernetes EngineKubernetes container orchestration service
ContainerContainer RegistryA service that easily stores, manages, and shares container images
Fig. Data Ops Preceding Service
Release Note
How-to guides