Data Flow-Based System-to-System Large-Scale Data Transfer Automation
Data Flow-Based System-to-System Large-Scale Data Transfer Automation
Overview
Data Flow is a data processing flow tool that visually creates a processing flow for extracting data from various data sources and transforming/streaming and transferring data. It provides an open-source Apache NiFi.
It can be used independently in the Kubernetes Engine cluster environment of Samsung Cloud Platform or with other application software.
Architecture Diagram
System Manager applies for the Data Flow service to automate the collection, transformation, and transmission of system-to-system data.
Data Engineer can modify the settings of the Data Flow service through Flow Manager and manage additional custom processor deployment and Flow Template files.
Data Flow service is based on Apache NiFi, allowing the creation and scheduling of flows with various processors through a GUI, and visually checking the data processing flow.
- Additionally, it uses NiFi Registry to provide version management and recovery functions for flows created in NiFi.
- Additionally, it uses NiFi Registry to provide version management and recovery functions for flows created in NiFi.
Collected and transformed data can be transmitted and stored in Cloud Hadoop (HDFS), PostgreSQL (DBaaS), Object Storage, and more.
Use Cases
Data Transfer between Various Data Sources
It can process and transfer large-scale data from various data sources to various targets. (File, NoSQL, RDB, HDFS, JMS, FTP, SFTP, Kafka, HTTP(s) REST, etc.)
Real-time Data Flow Control
You can check the real-time data processing steps as a flow file and design error handling cases to control the data transmission flow.
GUI-based Data Processing Flow Creation
You can create data extraction, transformation, and transmission tasks without coding using predefined data processing processors and GUI-based drag-and-drop.
Gallery-based Flow Template Management
It provides high-usage Flow Templates in Flow Manager and allows users to register and deploy additional custom Flow Templates.
Pre-requisites
None
Limitations
None
Considerations
An Ingress Controller must exist in the cluster to use Data Flow.
Related Services
This is a list of Samsung Cloud Platform services related to the features or configurations described in this guide. Please refer to them when selecting and designing services.
| Service Group | Service | Detailed Description |
|---|---|---|
| Container | Kubernetes Engine | Kubernetes container orchestration service |
| Storage | File Storage | Storage that allows multiple client servers to share files through network connections |
| Storage | Object Storage | Object storage suitable for data storage and search |
| Networking | VPC | Service that provides an independent virtual network in a cloud environment |
| Networking | Security Group | Virtual firewall that controls VM traffic |
| Networking | Load Balancer | Service that automatically distributes server traffic |
| Data Analytics | Data Ops | Service that automates workflow creation and task execution for data processing tasks |
