Data Flow based inter-system large-scale data transfer automation
Data Flow based inter-system large-scale data transfer automation
Overview
Data Flow is a data processing flow tool that visually creates processing flows for extracting data from various data sources and handling the transformation and transmission of stream/batch data, and provides the open-source Apache NiFi.
It can be used independently in the Kubernetes Engine cluster environment of Samsung Cloud Platform, or together with other application software.
Architecture Diagram
System Manger applies for the Data Flow service to perform automated collection, transformation, and transmission of data between systems.
Data Engineers can modify the settings of the Data Flow service through the Flow Manager, and manage the deployment of additional custom processors and Flow Template files.
Data Flow Service is based on Apache NiFi, allowing the creation and scheduling of flows for various processors through a GUI, and enabling visual inspection of data processing flows.
- Additionally, you can use NiFi Registry to manage versions and restore flows created in NiFi.
- Additionally, you can use NiFi Registry to manage versions and restore flows created in NiFi.
Collected and transformed data can be transmitted to and stored in Cloud Hadoop(HDFS), PostgreSQL(DBaaS), Object Storage, etc.
Use Cases
Data transfer between diverse data sources
You can process and transmit large volumes of data from various data sources to the desired diverse targets. (File, NoSQL, RDB, HDFS, JMS, FTP, SFTP, Kafka, HTTP(s) REST, etc.)
Real-time Data Flow Control
Regarding the data processing workflow, you can verify the real-time data processing stages with a Flow file, and by designing error handling cases, you can control the data transmission flow.
Create GUI-based data processing flow
By using pre-defined data processing processors, you can create data extraction, transformation, and transfer tasks without coding through a GUI-based Drag and Drop.
Gallery-based Flow Template management
We initially provide highly usable Flow Templates in Flow Manager, and users can additionally register and deploy Custom Flow Templates.
Prerequisites
None
Constraints
None
Considerations
To use Data Flow, an Ingress Controller must exist within the cluster.
Related service
This is a list of Samsung Cloud Platform services that are associated with the features or configurations described in this guide. Refer to it when selecting and designing services.
| service group | service | Detailed description |
|---|---|---|
| Container | Kubernetes Engine | Kubernetes container orchestration service |
| Storage | File Storage | Storage that enables multiple client servers to share files over a network connection. |
| Storage | Object Storage | Object storage that simplifies data storage and retrieval |
| Networking | VPC | A service that provides an isolated virtual network in a cloud environment |
| Networking | Security Group | Virtual firewall that controls VM traffic |
| Networking | Load Balancer | A service that automatically distributes server traffic load. |
| Data Analytics | Data Ops | A service that creates workflows for data processing tasks and automates task execution. |
