Overview
Service Overview
Data Ops is a managed workflow orchestration service based on Apache Airflow that creates workflows for data processing tasks that occur periodically or repeatedly and automates task scheduling. Users can automate the process of delivering useful data to the right place at the required time and monitor the configuration and progress of data pipelines.
Provided features
Data Ops provides the following features.
- Convenient Installation and Management: Data Ops can be easily installed via a web-based Console in a standard Kubernetes cluster environment. Apache Airflow and management modules are installed automatically, and the integrated dashboard provides unified monitoring of the web server and scheduler execution status.
- Dynamic Pipeline Configuration: You can configure pipelines for data tasks based on Python code. Because it integrates with data task scheduling and creates tasks dynamically, you can freely design the desired workflow shape and scheduling.
- Convenient workflow management: DAG (Direct Acyclic Graph: directed acyclic graph) configuration is visualized and managed through a web-based UI, allowing you to easily understand the sequence and parallel relationships of data flow. Additionally, you can easily manage each task’s timeout, retry count, and priority definitions.
Component
Data Ops consists of Manager and Service modules and provides a packaged Apache Airflow.
Data Ops Manager
Data Ops Manager provides various managing features to enable more efficient use of Airflow.
- Through Ops Manager, you can upload Plugin File, Shared File, and Python Library File for use in Ops Service.
- You can easily provision configuration information for Airflow components within the cluster.
- You can manage configuration information for other services within the Airflow cluster and provision it easily.
Data Ops Service
- We provide a managed workflow orchestration service based on Apache Airflow.
- When providing Airflow, you can set the Description, required resource size, DAGs GitSync, and Host Alias.
- After creating the service, you can modify the Description, resource size, DAGs GitSync, and Host Alias to apply changes to the service.
Server spec type
When creating a Data Ops service, check the following.
- Recommended Service Installation Specifications: CPU KubernetesExecutor 43 core, CPU CeleryExecutor 25 core, Memory 50 GB, Storage 100 GB or more
- Before creating the Data Ops service, you need to install the Ingress Controller.
- Only one Ingress Controller can be installed in a Kubernetes cluster.
- For more details, refer to Ingress Controller Installation.
Provision status by region
Data Ops is available in the environments below.
| region | Provision status |
|---|---|
| Korea West (kr-west1) | Provide |
| Korea East (kr-east1) | Provide |
| South Korea South 1 (kr-south1) | Not provided |
| South Korea South 2 (kr-south2) | Not provided |
| Korea South 3 (kr-south3) | Not provided |
Pre-service
This is a list of services that must be pre-configured before creating the service. Please refer to the guide provided for each service for details and prepare them in advance.
| Service Category | service | Detailed description |
|---|---|---|
| Storage | File Storage | Storage that enables multiple client servers to share files over a network connection. |
| Container | Kubernetes Engine | Kubernetes container orchestration service |
