The page has been translated by Gen AI.

Data Ops-Based Workflow Creation and Management

Data Ops-Based Workflow Creation and Management

Overview

Data Ops is a managed workflow orchestration service based on Apache Airflow that creates and automates workflows for periodic and repetitive data processing tasks.

It can be used independently in the Samsung Cloud Platform’s Kubernetes Engine cluster environment or with other application software.

Architecture Diagram

Architecture
Fig. Data Ops-Based Workflow Management
  1. System Manager applies for the Data Ops service to manage workflows for periodic and repetitive data processing tasks (extraction/loading/transformation/refining).

  2. Data Engineer can modify the settings of the Data Ops service and manage additional plugins/library files through Ops Manager.

  3. The Data Ops service is based on Apache Airflow and allows writing, scheduling, and monitoring workflows in DAG (Directed Acyclic Graph) format.

    • The worker that executes actual tasks runs dynamically.

  4. It can perform workflow-based tasks in conjunction with various systems such as Data Flow, Cloud Hadoop, Legacy System, and Object Storage.

Use Cases

Data-Driven Workflow Orchestration

Data Ops can orchestrate data-driven workflows, especially ETL/ELT.

It automatically organizes, monitors, and executes workflows.

It can be used as a scenario where tasks are executed through Spark and the results are stored in Cloud Hadoop.

Batch Workloads

It can be used as a pipeline that performs tasks such as fetching and transforming data from multiple sources in ETL pipelines or ELT tasks.

It can increase the visibility of batch processes and shorten the development cycle by separating batch tasks.

It is suitable for batch processing tasks that can handle delays between task executions.

Enterprise Scheduling

By linking with command shells, APIs, and enterprise execution containers, it can be scheduled with existing application tools.

It can orchestrate data pipeline services by communicating with existing services.

Pre-requisites

None

Limitations

None

Considerations

To use Data Ops, an Ingress Controller must exist within the cluster.

Related Services

This is a list of Samsung Cloud Platform services related to the features or configurations described in this guide. Refer to it when selecting and designing services.

Service GroupServiceDetailed Description
ContainerKubernetes EngineKubernetes container orchestration service
StorageFile StorageStorage that allows multiple client servers to share files through network connections
StorageObject StorageObject storage that is convenient for data storage and retrieval
NetworkingVPCService that provides an independent virtual network in a cloud environment
NetworkingSecurity GroupVirtual firewall that controls VM traffic
NetworkingLoad BalancerService that automatically distributes server traffic loads
Data AnalyticsData FlowService that automates data processing flows by extracting/transforming/transferring data from various sources
Table. Related Service List