The page has been translated by Gen AI.

Data Ops based Workflow worksuccess and Management

Data Ops based Workflow worksuccess and Management

Overview

Data Ops is a managed workflow orchestration service based on Apache Airflow that creates workflows for periodic, repetitive data processing tasks and automates job scheduling.

It can be used independently in the Kubernetes Engine cluster environment of Samsung Cloud Platform, or together with other application software.

Architecture Diagram

Diagram
Figure. Data Ops-based workflow management
  1. System Manger applies for the Data Ops service to manage the workflow of periodic, repetitive data processing tasks (extraction/loading/transformation/cleaning).

  2. Data Engineer can modify the settings of Data Ops service through Ops Manager and manage additional plugin/library files.

  3. Data Ops Service is built on Apache Airflow, and allows you to author, schedule, and monitor workflows in DAG (Directed Acyclic Graph) format.

    • The Worker that performs the actual work runs dynamically.

  4. You can perform workflow-based tasks by integrating with various systems such as Data Flow, Cloud Hadoop, Legacy System, Object Storage.

Use Cases

Data-driven (data driven) workflow orchestration

Data Ops can orchestrate data-driven workflows, especially ETL / ELT.

Automatically organizes, monitors, and executes the workflow.

It can be used as a scenario that runs through Spark and stores the results in Cloud Hadoop.

Batch workload

It can be used as a pipeline that retrieves and transforms data from multiple sources in ETL pipelines or ELT processes.

You can improve the visibility of batch processes and separate batch jobs to shorten the development cycle.

It is suitable for batch processing tasks that can handle delays between job executions.

Enterprise Scheduling

Command shell, API, by linking with the enterprise execution container, you can schedule with existing application tools.

You can communicate with existing services to orchestrate the data pipeline service.

Prerequisites

None

Constraints

None

Considerations

To use Data Ops, an Ingress Controller must exist within the cluster.

Related service

This is a list of Samsung Cloud Platform services that are related to the features or configurations described in this guide. Refer to it when selecting and designing services.

service groupserviceDetailed description
ContainerKubernetes EngineKubernetes container orchestration service
StorageFile StorageStorage that enables multiple client servers to share files over a network connection.
StorageObject StorageObject storage that simplifies data storage and retrieval
NetworkingVPCA service that provides an isolated virtual network in a cloud environment
NetworkingSecurity GroupVirtual firewall that controls VM traffic
NetworkingLoad BalancerA service that automatically distributes server traffic load.
Data AnalyticsData FlowA service that extracts, transforms, and transfers data from various sources and automates data processing workflows.
Table. List of related services