Data Ops based Workflow worksuccess and Management
Data Ops based Workflow worksuccess and Management
Overview
Data Ops is a managed workflow orchestration service based on Apache Airflow that creates workflows for periodic, repetitive data processing tasks and automates job scheduling.
It can be used independently in the Kubernetes Engine cluster environment of Samsung Cloud Platform, or together with other application software.
Architecture Diagram
System Manger applies for the Data Ops service to manage the workflow of periodic, repetitive data processing tasks (extraction/loading/transformation/cleaning).
Data Engineer can modify the settings of Data Ops service through Ops Manager and manage additional plugin/library files.
Data Ops Service is built on Apache Airflow, and allows you to author, schedule, and monitor workflows in DAG (Directed Acyclic Graph) format.
- The Worker that performs the actual work runs dynamically.
- The Worker that performs the actual work runs dynamically.
You can perform workflow-based tasks by integrating with various systems such as Data Flow, Cloud Hadoop, Legacy System, Object Storage.
Use Cases
Data-driven (data driven) workflow orchestration
Data Ops can orchestrate data-driven workflows, especially ETL / ELT.
Automatically organizes, monitors, and executes the workflow.
It can be used as a scenario that runs through Spark and stores the results in Cloud Hadoop.
Batch workload
It can be used as a pipeline that retrieves and transforms data from multiple sources in ETL pipelines or ELT processes.
You can improve the visibility of batch processes and separate batch jobs to shorten the development cycle.
It is suitable for batch processing tasks that can handle delays between job executions.
Enterprise Scheduling
Command shell, API, by linking with the enterprise execution container, you can schedule with existing application tools.
You can communicate with existing services to orchestrate the data pipeline service.
Prerequisites
None
Constraints
None
Considerations
To use Data Ops, an Ingress Controller must exist within the cluster.
Related service
This is a list of Samsung Cloud Platform services that are related to the features or configurations described in this guide. Refer to it when selecting and designing services.
| service group | service | Detailed description |
|---|---|---|
| Container | Kubernetes Engine | Kubernetes container orchestration service |
| Storage | File Storage | Storage that enables multiple client servers to share files over a network connection. |
| Storage | Object Storage | Object storage that simplifies data storage and retrieval |
| Networking | VPC | A service that provides an isolated virtual network in a cloud environment |
| Networking | Security Group | Virtual firewall that controls VM traffic |
| Networking | Load Balancer | A service that automatically distributes server traffic load. |
| Data Analytics | Data Flow | A service that extracts, transforms, and transfers data from various sources and automates data processing workflows. |
