1 - Overview

Service Overview

Data Flow is a data processing flow tool that extracts large amounts of data from various data sources and visually creates a processing flow for transformation/transmission of stream/batch data, providing open-source Apache NiFi. Data Flow can be used independently in the Kubernetes Engine cluster environment of the Samsung Cloud Platform or with other application software.

architecture diagram
Figure. Data Flow architecture diagram

Provided Features

Data Flow provides the following functions.

  • Easy installation and management: Data Flow can be easily installed through the web-based Samsung Cloud Platform Console in a standard Kubernetes cluster environment. Based on open-source Apache NiFi, it automatically configures the architecture required for extensible clustering, and automatically installs ZooKeeper, Registry, and management modules. Through Data Flow, you can set up and deploy the setting files, NiFi templates, etc. required for service connection.
  • Easy Data Flow Management: The processing flow of stream/batch data can be easily written in a GUI-based manner tailored to the user environment, and efficient data extraction/transmission/processing between systems is possible with GUI-based data flow writing.
  • NiFi Template Gallery: You can share/distribute reference NiFi templates. Data Flow provides a gallery of work files for data processing flows frequently used in the field, and users can share their own data processing flow tasks.

Component

Data Flow is composed of Manager and Service modules, and provides Apache NiFi as a package.

Data Flow Manager

Data Flow Manager provides various managing functions to utilize NiFi more efficiently.

  • Through Data Flow Manager, customers can upload the Nar File they created and use it in the Processor, and upload setting files to share them.
  • Among NiFi templates, high-frequency templates are assetized and provided as a gallery, and can be used immediately with just one click.
  • Provides real-time monitoring and resource status monitoring for multiple services configured for Native NiFi Service.
  • You can easily provision setting information for NiFi configuration components within the cluster.

Data Flow Service

  • It provides a data flow management service based on Apache NiFi.
  • It automatically configures the architecture required for extensible clustering based on Apache NiFi, and Nifi, ZooKeeper, Nifi Registry modules are automatically installed.
  • When providing Nifi, you can set Description, resource size, access ID/PW, and Host Alias.
  • After creating the service, you can modify the Description, necessary resource size, access password, Host Alias, etc. and reflect them in the service.

Server spec type

When creating a Data Flow service, please check the following contents.

  • Recommended Service Installation Specifications: CPU 21 core, Memory 57 GB, storage 100 GB or more
Reference
  • The Data Flow service needs to be installed before creating the Ingress Controller.
  • In a Kubernetes cluster, only 1 Ingress Controller can be installed.
  • For more information, please refer to Ingress Controller installation.

Regional Provision Status

Data Flow is available in the following environments.

RegionAvailability
Western Korea (kr-west1)Provided
Korea East (kr-east1)Available
South Korea (kr-south1)Not provided
South Korea southern region 2 (kr-south2)Not provided
South Korea southern region 3(kr-south3)Not provided
Table. Data Flow Provision Status by Region

Preceding Service

This is a list of services that must be pre-configured before creating this service. Please refer to the guide provided for each service and prepare in advance.

Service CategoryServiceDetailed Description
StorageFile StorageStorage that allows multiple client servers to share files through network connections
ContainerKubernetes EngineKubernetes container orchestration service
Fig. Preceding Data Flow Service

2 - How-to guides

The user can enter the essential information of Data Flow through the Samsung Cloud Platform Console and create the service by selecting detailed options.

Creating Data Flow

You can create and use the Data Flow service in the Samsung Cloud Platform Console.

To create a Data Flow, follow the next procedure.

  1. Click on the menu for all services > Data Analytics > Data Flow. It moves to the Service Home page of Data Flow.

  2. On the Service Home page, click the Create Data Flow button. It moves to the Create Data Flow page.

  3. Data Flow Creation page where you enter the information needed to create a service and select detailed options.

    • Version Selection area, please select the necessary information.

      Division
      Necessity
      Detailed Description
      Data Flow versionrequiredSelect version of the selected image
      • Provide a list of versions of the server image provided
      Fig. Data Flow version selection items

    • Cluster Selection area, please enter or select the required information. To install Data Flow, creating nodes for the Kubernetes cluster and a workspace is required first.

      Classification
      Necessity
      Detailed Description
      Cluster NameRequiredSelect Cluster to Use
      Ingress ControllerRequiredSelect the Ingress Controller installed in the cluster
      • In the Details tab of the installed Ingress Controller, add the following information to the ConfigMap item:
        • Key: allow-snippet-annotations
        • Value: true
      Fig. Data Flow cluster selection items

    • Service Information Input area, please enter or select the necessary information.

      Classification
      Necessity
      Detailed Description
      Data Flow namerequiredEnter Data Flow name
      • Start with lowercase English letters and do not end with a special character (-), using lowercase English letters, numbers, and special characters (-) to input 3 ~ 30 characters
      Storage ClassRequiredSelect the storage class used by the chosen cluster
      DescriptionSelectEnter additional information or description about the Data Flow within 150 characters
      Domain settingMandatoryEnter Data Flow domain
      • Start with lowercase English letters and do not end with a special character (-), using lowercase letters, numbers, and special characters (-) to input 3 to 50 characters
      • {Data Flow name}.{set domain} will be the Data Flow access address.
      Node SelectorRequiredTo install on a specific node, enter a distinguishable label from the node’s labels
      • If the node label is entered incorrectly, an installation error may occur, so check the node label in advance
      • The node label can be checked in the yaml file of the corresponding node
      AccountRequiredEnter Data Flow Manager account
      • ID: Starts with lowercase English letters and uses lowercase letters and numbers to enter a value between 6 and 30
      • Password: Includes uppercase (English), lowercase (English), numbers, and special characters (!@#$%^&*) and enter 8 to 50 characters
      • Password Confirmation: Enter the password exactly once more
      Host AliasSelectionAdd host information to be connected to Data Flow (up to 20 can be created, including default)
      • Select “Use”, then click the + button
      • Hostname: Enter in hostname or domain format, using lowercase, numbers, and special characters (-) with 3-63 characters
      • IP: Enter in IP format
      • To delete, click the X button
      • The firewall between the cluster and the server must be open to use the added host information
      Fig. Data Flow service information input items

    • Enter Additional Information area, please enter or select the necessary information.

      Division
      Necessity
      Detailed Description
      TagSelectionTag addition
      • Tag addition button to create and add tags or add existing tags possible
      • Up to 50 tags can be added
      • Newly added tags are applied after service creation is complete
      Fig. Data Flow Additional Information Input Items

  4. In the Summary panel, review the detailed information and estimated charges, then click the Complete button.

    • Once creation is complete, check the created resource on the Data Flow list page.

Check Data Flow Detailed Information

You can check and modify the list of all resources and detailed information of Data Flow. The Data Flow details page consists of detailed information, tags, and work history tabs.

To check the detailed information of Data Flow, follow the next procedure.

  1. Click on the menu for all services > Data Analytics > Data Flow. It moves to the Service Home page of Data Flow.
  2. On the Service Home page, click the Data Flow menu. It moves to the Data Flow list page.
  3. Data Flow list page, click on the resource to check the detailed information. It moves to the Data Flow details page.
    • Data Flow Details page top shows status information and additional function information.
ClassificationDetailed Description
Status DisplayData Flow Status
  • Creating: being created
  • Running: operating, Data Flow Services can be created
  • Updating: settings are being updated
  • Terminating: service is being terminated
  • Error: error occurred during creation or service is in an abnormal state
Hosts file setting informationButton to check and copy host file information to access Data Flow
Service CancellationButton to cancel the service
Fig. Data Flow status information and additional functions

Detailed Information

On the Data Flow List page, you can check the detailed information of the selected resource and modify the information if necessary.

ClassificationDetailed Description
ServiceService Category
Resource TypeService Name
SRNSamsung Cloud Platform의 고유 자원 ID
  • 클러스터 SRN을 의미
Resource NameResource Name
  • Means cluster name
Resource IDUnique resource ID in the service
CreatorUser who created the service
Creation TimeTime when the service was created
ModifierUser who modified the service information
Revision TimeTime when service information was revised
Cluster NameServer cluster name composed of servers
Storage ClassStorage class used by the selected cluster
DescriptionAdditional information or description about Data Flow
Domain SettingData Flow Domain Name
Node SelectorNode Label
Web UrlData Flow URL
AccountData Flow Manager account
Host AliasHost information to be connected to Data Flow
Fig. Data Flow detailed information tab items

Tag

On the Data Flow List page, you can check the tag information of the selected resource, and add, change, or delete it.

ClassificationDetailed Description
Tag listTag list
  • Check Key, Value information of the tag
  • Up to 50 tags can be added per resource
  • When entering a tag, search and select from the existing Key and Value list
Fig. Data Flow tag tab items

Work History

You can check the work history of the selected resource on the Data Flow list page.

ClassificationDetailed Description
Work history listResource change history
  • Check work time, resource ID, resource name, work details, event topic, work result, and worker information
Fig. Data Flow job history tab detailed information items

Data Flow cancellation

You can cancel unused Data Flow to reduce operating costs. However, if you cancel the service, the operating service may be stopped immediately, so you should consider the impact of stopping the service sufficiently before proceeding with the cancellation work.

To cancel Data Flow, follow the next procedure.

  1. Click on the menu for all services > Data Analytics > Data Flow. It moves to the Service Home page of Data Flow.
  2. Service Home page, click the Data Flow menu. It moves to the Data Flow list page.
  3. Data Flow list page, select the resource to be canceled and click the Service Cancellation button.
  4. Once the cancellation is complete, check the Data Flow list page to see if the resource has been cancelled.
Notice
  • Data Flow You must first delete the connected Data Flow Services to cancel.
  • Data Flow will be cancelled, and the created namespace will also be deleted.

2.1 - Data Flow Services

The user can enter the essential information of Data Flow Services in the Data Flow service through the Samsung Cloud Platform Console and create the service by selecting detailed options.

Create Data Flow Services

The user can add a service by selecting the detailed options of the Data Flow service or entering the setting value.

Notice
When applying for Data Flow Services, the scale of resources must be secured to be more than the available capacity of the K8s cluster.

To create Data Flow Services, follow these steps.

  1. Click all services > Data Analytics > Data Flow menu. It moves to Data Flow Service Home page.

  2. On the Service Home page, click Data Flow Services. It moves to the Data Flow Services list page.

  3. On the Data Flow Services list page, click the Create Data Flow Services button. It moves to the Create Data Flow Services page.

  4. Data Flow Services Creation page, enter the information required for service creation and select detailed options.

    • Enter Service Information Enter or select the required information in the area.

      Classification
      Necessity
      Detailed Description
      Data Flow namerequiredData Flow selection
      Flow Service nameRequiredEnter Data Flow Services name
      • Start with lowercase English letters and do not end with a special character (-), use lowercase letters, numbers, and special characters (-) to enter 3 to 30 characters
      Storage ClassRequiredSelect the storage class used by the selected cluster
      DescriptionSelectEnter additional information or description about Data Flow Services within 150 characters
      Domain SettingMandatoryEnter the Data Flow Services domain
      • Start with lowercase English letters and do not end with a special character (-), use lowercase letters, numbers, and special characters (-) to input 3 ~ 50 characters
      • {Data Flow Services name}.{set domain} will be the Data Flow Services access address.
      Node SelectorRequiredTo install on a specific node, enter a distinguishable Label from the node’s Labels
      • If the node Label is entered incorrectly, an installation error may occur, so check the node Label in advance
      • The node Label can be checked in the yaml file of the corresponding node
      Service WorkloadRequired
      • Nifi: A module that provides Apache Nifi services and UI
      • Nifi Registry: A module for setting and deploying Nifi templates
      • Zookeeper: A module that supports distributed processing of Nifi in multiple nodes
      AccountRequiredEnter Nifi account
      • ID: Enter a value between 6 and 30 characters, starting with a lowercase letter and using lowercase letters and numbers
      • Password: Enter a value of 8 to 50 characters, including uppercase letters (English), lowercase letters (English), numbers, and special characters (!@#$%^&*)
      • Password Confirmation: Enter the password again, identical to the previous entry
      Fig. Data Flow Services Service Information Input Items

    • Additional Information Input area, please enter or select the required information.

      Classification
      Necessity
      Detailed Description
      Host AliasSelectionAdd host information to be connected to Data Flow (up to 20 can be created, including default)
      • Use is selected and then + button is clicked
      • Hostname: in the form of hostname or domain, using lowercase letters, numbers, and special characters (-) to enter 3 ~ 63 characters
      • IP: enter in IP format
      • click the X button to delete
      • the firewall between the cluster and the corresponding server must be open to use the added host information
      TagSelectionAdd tag
      • Add tag button to create and add tags or add existing tags
      • Up to 50 tags can be added
      • Newly added tags are applied after service creation is completed
      Fig. Data Flow Additional Information Input Items

  5. In the Summary panel, review the detailed information and estimated charges, and click the Complete button.

    • Once creation is complete, check the created resource on the Data Flow Services list page.

Data Flow Services detailed information check

You can check and modify the list of all resources and detailed information of Data Flow Services. The Data Flow Services details page consists of details, tags, and operation history tabs.

To check the detailed information of Data Flow Services, follow the next procedure.

  1. 모든 서비스 > Data Analytics > Data Flow menu should be clicked. It moves to the Service Home page of Data Flow.
  2. Service Home page, click the Data Flow Services menu. It moves to the Data Flow Services list page.
  3. Data Flow Services list page, click on the resource to check the detailed information. Move to the Data Flow Services details page.
    • Data Flow Services Details page displays status information and additional features at the top.
ClassificationDetailed Description
Status DisplayData Flow Services status
  • Creating: being created
  • Running: in operation
  • Updating: updating settings
  • Terminating: service termination in progress
  • Error: creation failed or service unavailable
Hosts file setting informationA button to check and copy host file information to access Data Flow Services
Data Flow Services deletionButton to cancel the service
Fig. Data Flow Services Status Information and Additional Functions

Detailed Information

On the Data Flow Services list page, you can check the detailed information of the selected resource and modify the information if necessary.

DivisionDetailed Description
ServiceService Name
Resource TypeResource Type
SRNUnique resource ID in Samsung Cloud Platform
  • Means cluster SRN
Resource NameResource Name
  • Means cluster name
Resource IDUnique resource ID in the service
CreatorService creator user
Creation TimeThe time when the service was created
ModifierUser who modified the service information
Modified TimeTime when service information was modified
Data Flow NameData Flow Name
Storage ClassStorage class used by the selected cluster
DescriptionAdditional information or description about Data Flow Services
Domain SettingData Flow Services domain name
Node SelectorNode Label
Web UrlData Flow Services URL
AccountAirflow Account
Host AliasHost information to be connected to Data Flow Services
Fig. Data Flow Services detailed information tab items

Tag

On the Data Flow Services List page, you can check the tag information of the selected resource, and add, change, or delete it.

ClassificationDetailed Description
Tag listTag list
  • Key, Value information of the tag can be checked
  • Up to 50 tags can be added per resource
  • When entering a tag, search and select from the existing Key and Value list
Fig. Data Flow Services Tag Tab Items

Work History

You can check the operation history of the selected resource on the Data Flow Services list page.

ClassificationDetailed Description
Work history listResource change history
  • Check work date, resource ID, resource name, work details, event topic, work result, and worker information
Fig. Data Flow Services job history tab detailed information items

Cancel Data Flow Services

You can cancel unused Data Flow Services to reduce operating costs. However, when canceling a service, the operating service may be stopped immediately, so you should consider the impact of stopping the service sufficiently before proceeding with the cancellation work.

To cancel Data Flow or Data Flow Services, follow the procedure below.

  1. Click All Services > Data Analytics > Data Flow menu. It moves to the Service Home page of Data Flow.
  2. Service Home page, click the Data Flow Services menu. Move to the Data Flow Services list page.
  3. Data Flow Services list page, select the resource to be canceled and click the Data Flow Services delete button.
  4. Once the cancellation is complete, please check if the resource has been cancelled on the Data Flow Services list page.
Notice
  • Data Flow Services will be cancelled, and the created namespace will also be deleted.

2.2 - Installing Ingress Controller

The user must install the Ingress Controller before creating the Data Flow service. Only one Ingress Controller must be installed in a Kubernetes cluster.

Install Ingress Controller using Container Registry

To install Ingress Controller using Container Registry, follow the procedure below.

For detailed Container Registry creation methods, please refer to the Container > Container Registry > How-to guides guide.
  1. Prepare the SCR (Samsung Container Registry) to store the Ingress Controller image.
  2. Push the Ingress Controller image to SCR(Samsung Container Registry).
  3. Download the YAML file used for installation from Ingress GitHub and modify the following items.
Color mode
kind: Deployment
...
spec:
  template:
    spec:
      containers:
        image: {SCR private endpoint}.{repository name}.{image name}:{tag}
kind: Deployment
...
spec:
  template:
    spec:
      containers:
        image: {SCR private endpoint}.{repository name}.{image name}:{tag}
Code Block. SCR Information Change
Color mode
kind: ConfigMap
...
metadata:
  labels:
    app: ingress-controller

kind: Service
...
metadata:
  labels:
    app: ingress-controller

kind: Deployment
...
metadata:
  labels:
    app: ingress-controller

kind: IngressClass
...
metadata:
  labels:
    app: ingress-controller
kind: ConfigMap
...
metadata:
  labels:
    app: ingress-controller

kind: Service
...
metadata:
  labels:
    app: ingress-controller

kind: Deployment
...
metadata:
  labels:
    app: ingress-controller

kind: IngressClass
...
metadata:
  labels:
    app: ingress-controller
Code block. Label information added - metadata: labels: app: ingress-controller
  1. You can install the Ingress Controller using the Create Object button in the Workloads > Deployments list in Kubernetes Engine using the modified YAML file.
Reference
For detailed object creation methods, please refer to Container > Kubernetes Engine > Creating Deployments.

3 - API Reference

API Reference

4 - CLI Reference

CLI Reference

5 - Release Note

Data Flow

2025.04.28
NEW Official Release of Data Flow Service
  • The Data Flow service, which extracts/transforms/transfers data from various sources and automates data processing flows, has been released.
  • It provides open-source Apache NiFi.