The page has been translated by Gen AI.

Data Flow-Based System-to-System Large-Scale Data Transfer Automation

Data Flow-Based System-to-System Large-Scale Data Transfer Automation

Overview

Data Flow is a data processing flow tool that visually creates a processing flow for extracting data from various data sources and transforming/streaming and transferring data. It provides an open-source Apache NiFi.

It can be used independently in the Kubernetes Engine cluster environment of Samsung Cloud Platform or with other application software.

Architecture Diagram

구성도
Figure. Data Flow-based large-scale data transfer automation
  1. System Manager applies for the Data Flow service to automate the collection, transformation, and transmission of system-to-system data.

  2. Data Engineer can modify the settings of the Data Flow service through Flow Manager and manage additional custom processor deployment and Flow Template files.

  3. Data Flow service is based on Apache NiFi, allowing the creation and scheduling of flows with various processors through a GUI, and visually checking the data processing flow.

    • Additionally, it uses NiFi Registry to provide version management and recovery functions for flows created in NiFi.

  4. Collected and transformed data can be transmitted and stored in Cloud Hadoop (HDFS), PostgreSQL (DBaaS), Object Storage, and more.

Use Cases

Data Transfer between Various Data Sources

It can process and transfer large-scale data from various data sources to various targets. (File, NoSQL, RDB, HDFS, JMS, FTP, SFTP, Kafka, HTTP(s) REST, etc.)

Real-time Data Flow Control

You can check the real-time data processing steps as a flow file and design error handling cases to control the data transmission flow.

GUI-based Data Processing Flow Creation

You can create data extraction, transformation, and transmission tasks without coding using predefined data processing processors and GUI-based drag-and-drop.

It provides high-usage Flow Templates in Flow Manager and allows users to register and deploy additional custom Flow Templates.

Pre-requisites

None

Limitations

None

Considerations

An Ingress Controller must exist in the cluster to use Data Flow.

Related Services

This is a list of Samsung Cloud Platform services related to the features or configurations described in this guide. Please refer to them when selecting and designing services.

Service GroupServiceDetailed Description
ContainerKubernetes EngineKubernetes container orchestration service
StorageFile StorageStorage that allows multiple client servers to share files through network connections
StorageObject StorageObject storage suitable for data storage and search
NetworkingVPCService that provides an independent virtual network in a cloud environment
NetworkingSecurity GroupVirtual firewall that controls VM traffic
NetworkingLoad BalancerService that automatically distributes server traffic
Data AnalyticsData OpsService that automates workflow creation and task execution for data processing tasks
Table. Related Service List