This is the multi-page printable view of this section. Click here to print.
Data Flow
1 - Overview
Service Overview
Data Flow is a data processing flow tool that extracts large amounts of data from various data sources and visually creates a processing flow for transformation/transmission of stream/batch data, providing open-source Apache NiFi. Data Flow can be used independently in the Kubernetes Engine cluster environment of the Samsung Cloud Platform or with other application software.
Provided Features
Data Flow provides the following functions.
- Easy installation and management: Data Flow can be easily installed through the web-based Samsung Cloud Platform Console in a standard Kubernetes cluster environment. Based on open-source Apache NiFi, it automatically configures the architecture required for extensible clustering, and automatically installs ZooKeeper, Registry, and management modules. Through Data Flow, you can set up and deploy the setting files, NiFi templates, etc. required for service connection.
- Easy Data Flow Management: The processing flow of stream/batch data can be easily written in a GUI-based manner tailored to the user environment, and efficient data extraction/transmission/processing between systems is possible with GUI-based data flow writing.
- NiFi Template Gallery: You can share/distribute reference NiFi templates. Data Flow provides a gallery of work files for data processing flows frequently used in the field, and users can share their own data processing flow tasks.
Component
Data Flow is composed of Manager and Service modules, and provides Apache NiFi as a package.
Data Flow Manager
Data Flow Manager provides various managing functions to utilize NiFi more efficiently.
- Through Data Flow Manager, customers can upload the Nar File they created and use it in the Processor, and upload setting files to share them.
- Among NiFi templates, high-frequency templates are assetized and provided as a gallery, and can be used immediately with just one click.
- Provides real-time monitoring and resource status monitoring for multiple services configured for Native NiFi Service.
- You can easily provision setting information for NiFi configuration components within the cluster.
Data Flow Service
- It provides a data flow management service based on Apache NiFi.
- It automatically configures the architecture required for extensible clustering based on Apache NiFi, and Nifi, ZooKeeper, Nifi Registry modules are automatically installed.
- When providing Nifi, you can set Description, resource size, access ID/PW, and Host Alias.
- After creating the service, you can modify the Description, necessary resource size, access password, Host Alias, etc. and reflect them in the service.
Server spec type
When creating a Data Flow service, please check the following contents.
- Recommended Service Installation Specifications: CPU 21 core, Memory 57 GB, storage 100 GB or more
- The Data Flow service needs to be installed before creating the Ingress Controller.
- In a Kubernetes cluster, only 1 Ingress Controller can be installed.
- For more information, please refer to Ingress Controller installation.
Regional Provision Status
Data Flow is available in the following environments.
| Region | Availability |
|---|---|
| Western Korea (kr-west1) | Provided |
| Korea East (kr-east1) | Available |
| South Korea (kr-south1) | Not provided |
| South Korea southern region 2 (kr-south2) | Not provided |
| South Korea southern region 3(kr-south3) | Not provided |
Preceding Service
This is a list of services that must be pre-configured before creating this service. Please refer to the guide provided for each service and prepare in advance.
| Service Category | Service | Detailed Description |
|---|---|---|
| Storage | File Storage | Storage that allows multiple client servers to share files through network connections |
| Container | Kubernetes Engine | Kubernetes container orchestration service |
2 - How-to guides
The user can enter the essential information of Data Flow through the Samsung Cloud Platform Console and create the service by selecting detailed options.
Creating Data Flow
You can create and use the Data Flow service in the Samsung Cloud Platform Console.
To create a Data Flow, follow the next procedure.
Click on the menu for all services > Data Analytics > Data Flow. It moves to the Service Home page of Data Flow.
On the Service Home page, click the Create Data Flow button. It moves to the Create Data Flow page.
Data Flow Creation page where you enter the information needed to create a service and select detailed options.
Version Selection area, please select the necessary information.
Division NecessityDetailed Description Data Flow version required Select version of the selected image - Provide a list of versions of the server image provided
Fig. Data Flow version selection itemsCluster Selection area, please enter or select the required information. To install Data Flow, creating nodes for the Kubernetes cluster and a workspace is required first.
Classification NecessityDetailed Description Cluster Name Required Select Cluster to Use Ingress Controller Required Select the Ingress Controller installed in the cluster - In the Details tab of the installed Ingress Controller, add the following information to the ConfigMap item:
- Key: allow-snippet-annotations
- Value: true
Fig. Data Flow cluster selection items - In the Details tab of the installed Ingress Controller, add the following information to the ConfigMap item:
Service Information Input area, please enter or select the necessary information.
Classification NecessityDetailed Description Data Flow name required Enter Data Flow name - Start with lowercase English letters and do not end with a special character (
-), using lowercase English letters, numbers, and special characters (-) to input 3 ~ 30 characters
Storage Class Required Select the storage class used by the chosen cluster Description Select Enter additional information or description about the Data Flow within 150 characters Domain setting Mandatory Enter Data Flow domain - Start with lowercase English letters and do not end with a special character (
-), using lowercase letters, numbers, and special characters (-) to input 3 to 50 characters
- {Data Flow name}.{set domain} will be the Data Flow access address.
Node Selector Required To install on a specific node, enter a distinguishable label from the node’s labels - If the node label is entered incorrectly, an installation error may occur, so check the node label in advance
- The node label can be checked in the yaml file of the corresponding node
Account Required Enter Data Flow Manager account - ID: Starts with lowercase English letters and uses lowercase letters and numbers to enter a value between 6 and 30
- Password: Includes uppercase (English), lowercase (English), numbers, and special characters (
!@#$%^&*) and enter 8 to 50 characters
- Password Confirmation: Enter the password exactly once more
Host Alias Selection Add host information to be connected to Data Flow (up to 20 can be created, including default) - Select “Use”, then click the + button
- Hostname: Enter in hostname or domain format, using lowercase, numbers, and special characters (
-) with 3-63 characters
- IP: Enter in IP format
- To delete, click the X button
- The firewall between the cluster and the server must be open to use the added host information
Fig. Data Flow service information input items - Start with lowercase English letters and do not end with a special character (
Enter Additional Information area, please enter or select the necessary information.
Division NecessityDetailed Description Tag Selection Tag addition - Tag addition button to create and add tags or add existing tags possible
- Up to 50 tags can be added
- Newly added tags are applied after service creation is complete
Fig. Data Flow Additional Information Input Items
In the Summary panel, review the detailed information and estimated charges, then click the Complete button.
- Once creation is complete, check the created resource on the Data Flow list page.
Check Data Flow Detailed Information
You can check and modify the list of all resources and detailed information of Data Flow. The Data Flow details page consists of detailed information, tags, and work history tabs.
To check the detailed information of Data Flow, follow the next procedure.
- Click on the menu for all services > Data Analytics > Data Flow. It moves to the Service Home page of Data Flow.
- On the Service Home page, click the Data Flow menu. It moves to the Data Flow list page.
- Data Flow list page, click on the resource to check the detailed information. It moves to the Data Flow details page.
- Data Flow Details page top shows status information and additional function information.
| Classification | Detailed Description |
|---|---|
| Status Display | Data Flow Status
|
| Hosts file setting information | Button to check and copy host file information to access Data Flow |
| Service Cancellation | Button to cancel the service |
Detailed Information
On the Data Flow List page, you can check the detailed information of the selected resource and modify the information if necessary.
| Classification | Detailed Description |
|---|---|
| Service | Service Category |
| Resource Type | Service Name |
| SRN | Samsung Cloud Platform의 고유 자원 ID
|
| Resource Name | Resource Name
|
| Resource ID | Unique resource ID in the service |
| Creator | User who created the service |
| Creation Time | Time when the service was created |
| Modifier | User who modified the service information |
| Revision Time | Time when service information was revised |
| Cluster Name | Server cluster name composed of servers |
| Storage Class | Storage class used by the selected cluster |
| Description | Additional information or description about Data Flow |
| Domain Setting | Data Flow Domain Name |
| Node Selector | Node Label |
| Web Url | Data Flow URL |
| Account | Data Flow Manager account |
| Host Alias | Host information to be connected to Data Flow |
Tag
On the Data Flow List page, you can check the tag information of the selected resource, and add, change, or delete it.
| Classification | Detailed Description |
|---|---|
| Tag list | Tag list
|
Work History
You can check the work history of the selected resource on the Data Flow list page.
| Classification | Detailed Description |
|---|---|
| Work history list | Resource change history
|
| Fig. Data Flow job history tab detailed information items |
Data Flow cancellation
You can cancel unused Data Flow to reduce operating costs. However, if you cancel the service, the operating service may be stopped immediately, so you should consider the impact of stopping the service sufficiently before proceeding with the cancellation work.
To cancel Data Flow, follow the next procedure.
- Click on the menu for all services > Data Analytics > Data Flow. It moves to the Service Home page of Data Flow.
- Service Home page, click the Data Flow menu. It moves to the Data Flow list page.
- Data Flow list page, select the resource to be canceled and click the Service Cancellation button.
- Once the cancellation is complete, check the Data Flow list page to see if the resource has been cancelled.
- Data Flow You must first delete the connected Data Flow Services to cancel.
- Data Flow will be cancelled, and the created namespace will also be deleted.
2.1 - Data Flow Services
The user can enter the essential information of Data Flow Services in the Data Flow service through the Samsung Cloud Platform Console and create the service by selecting detailed options.
Create Data Flow Services
The user can add a service by selecting the detailed options of the Data Flow service or entering the setting value.
To create Data Flow Services, follow these steps.
Click all services > Data Analytics > Data Flow menu. It moves to Data Flow Service Home page.
On the Service Home page, click Data Flow Services. It moves to the Data Flow Services list page.
On the Data Flow Services list page, click the Create Data Flow Services button. It moves to the Create Data Flow Services page.
Data Flow Services Creation page, enter the information required for service creation and select detailed options.
Enter Service Information Enter or select the required information in the area.
Classification NecessityDetailed Description Data Flow name required Data Flow selection Flow Service name Required Enter Data Flow Services name - Start with lowercase English letters and do not end with a special character (
-), use lowercase letters, numbers, and special characters (-) to enter 3 to 30 characters
Storage Class Required Select the storage class used by the selected cluster Description Select Enter additional information or description about Data Flow Services within 150 characters Domain Setting Mandatory Enter the Data Flow Services domain - Start with lowercase English letters and do not end with a special character (
-), use lowercase letters, numbers, and special characters (-) to input 3 ~ 50 characters
- {Data Flow Services name}.{set domain} will be the Data Flow Services access address.
Node Selector Required To install on a specific node, enter a distinguishable Label from the node’s Labels - If the node Label is entered incorrectly, an installation error may occur, so check the node Label in advance
- The node Label can be checked in the yaml file of the corresponding node
Service Workload Required - Nifi: A module that provides Apache Nifi services and UI
- Nifi Registry: A module for setting and deploying Nifi templates
- Zookeeper: A module that supports distributed processing of Nifi in multiple nodes
Account Required Enter Nifi account - ID: Enter a value between 6 and 30 characters, starting with a lowercase letter and using lowercase letters and numbers
- Password: Enter a value of 8 to 50 characters, including uppercase letters (English), lowercase letters (English), numbers, and special characters (
!@#$%^&*)
- Password Confirmation: Enter the password again, identical to the previous entry
Fig. Data Flow Services Service Information Input Items- Start with lowercase English letters and do not end with a special character (
Additional Information Input area, please enter or select the required information.
Classification NecessityDetailed Description Host Alias Selection Add host information to be connected to Data Flow (up to 20 can be created, including default) - Use is selected and then + button is clicked
- Hostname: in the form of hostname or domain, using lowercase letters, numbers, and special characters (
-) to enter 3 ~ 63 characters
- IP: enter in IP format
- click the X button to delete
- the firewall between the cluster and the corresponding server must be open to use the added host information
Tag Selection Add tag - Add tag button to create and add tags or add existing tags
- Up to 50 tags can be added
- Newly added tags are applied after service creation is completed
Fig. Data Flow Additional Information Input Items
In the Summary panel, review the detailed information and estimated charges, and click the Complete button.
- Once creation is complete, check the created resource on the Data Flow Services list page.
Data Flow Services detailed information check
You can check and modify the list of all resources and detailed information of Data Flow Services. The Data Flow Services details page consists of details, tags, and operation history tabs.
To check the detailed information of Data Flow Services, follow the next procedure.
- 모든 서비스 > Data Analytics > Data Flow menu should be clicked. It moves to the Service Home page of Data Flow.
- Service Home page, click the Data Flow Services menu. It moves to the Data Flow Services list page.
- Data Flow Services list page, click on the resource to check the detailed information. Move to the Data Flow Services details page.
- Data Flow Services Details page displays status information and additional features at the top.
| Classification | Detailed Description |
|---|---|
| Status Display | Data Flow Services status
|
| Hosts file setting information | A button to check and copy host file information to access Data Flow Services |
| Data Flow Services deletion | Button to cancel the service |
Detailed Information
On the Data Flow Services list page, you can check the detailed information of the selected resource and modify the information if necessary.
| Division | Detailed Description |
|---|---|
| Service | Service Name |
| Resource Type | Resource Type |
| SRN | Unique resource ID in Samsung Cloud Platform
|
| Resource Name | Resource Name
|
| Resource ID | Unique resource ID in the service |
| Creator | Service creator user |
| Creation Time | The time when the service was created |
| Modifier | User who modified the service information |
| Modified Time | Time when service information was modified |
| Data Flow Name | Data Flow Name |
| Storage Class | Storage class used by the selected cluster |
| Description | Additional information or description about Data Flow Services |
| Domain Setting | Data Flow Services domain name |
| Node Selector | Node Label |
| Web Url | Data Flow Services URL |
| Account | Airflow Account |
| Host Alias | Host information to be connected to Data Flow Services |
Tag
On the Data Flow Services List page, you can check the tag information of the selected resource, and add, change, or delete it.
| Classification | Detailed Description |
|---|---|
| Tag list | Tag list
|
Work History
You can check the operation history of the selected resource on the Data Flow Services list page.
| Classification | Detailed Description |
|---|---|
| Work history list | Resource change history
|
Cancel Data Flow Services
You can cancel unused Data Flow Services to reduce operating costs. However, when canceling a service, the operating service may be stopped immediately, so you should consider the impact of stopping the service sufficiently before proceeding with the cancellation work.
To cancel Data Flow or Data Flow Services, follow the procedure below.
- Click All Services > Data Analytics > Data Flow menu. It moves to the Service Home page of Data Flow.
- Service Home page, click the Data Flow Services menu. Move to the Data Flow Services list page.
- Data Flow Services list page, select the resource to be canceled and click the Data Flow Services delete button.
- Once the cancellation is complete, please check if the resource has been cancelled on the Data Flow Services list page.
- Data Flow Services will be cancelled, and the created namespace will also be deleted.
2.2 - Installing Ingress Controller
The user must install the Ingress Controller before creating the Data Flow service. Only one Ingress Controller must be installed in a Kubernetes cluster.
Install Ingress Controller using Container Registry
To install Ingress Controller using Container Registry, follow the procedure below.
- Prepare the SCR (Samsung Container Registry) to store the Ingress Controller image.
- Push the Ingress Controller image to SCR(Samsung Container Registry).
- Download the YAML file used for installation from Ingress GitHub and modify the following items.
kind: Deployment
...
spec:
template:
spec:
containers:
image: {SCR private endpoint}.{repository name}.{image name}:{tag}kind: Deployment
...
spec:
template:
spec:
containers:
image: {SCR private endpoint}.{repository name}.{image name}:{tag}kind: ConfigMap
...
metadata:
labels:
app: ingress-controller
kind: Service
...
metadata:
labels:
app: ingress-controller
kind: Deployment
...
metadata:
labels:
app: ingress-controller
kind: IngressClass
...
metadata:
labels:
app: ingress-controllerkind: ConfigMap
...
metadata:
labels:
app: ingress-controller
kind: Service
...
metadata:
labels:
app: ingress-controller
kind: Deployment
...
metadata:
labels:
app: ingress-controller
kind: IngressClass
...
metadata:
labels:
app: ingress-controller- You can install the Ingress Controller using the Create Object button in the Workloads > Deployments list in Kubernetes Engine using the modified YAML file.
