This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Overview

Service Overview

Data Flow is a data processing workflow tool that visually creates processing flows for extracting large volumes of data from various data sources and for transforming and transmitting stream/batch data, and it provides the open-source Apache NiFi. Data Flow can be used independently in the Kubernetes Engine cluster environment of Samsung Cloud Platform, or together with other application software.

Diagram
Figure. Data Flow Diagram

Provided features

Data Flow provides the following functions.

  • Convenient Installation and Management: Data Flow can be easily installed in a standard Kubernetes cluster environment via the web-based Samsung Cloud Platform Console. It automatically configures the architecture required for scalable clustering based on the open-source Apache NiFi, automatically installing ZooKeeper, Registry, and management modules. With Data Flow, you can configure and deploy configuration files, NiFi templates, and other assets needed for service integration.
  • Easy Data Flow Management: You can easily create processing flows for stream/batch data in a GUI that matches the user environment, and by authoring data processing flows in a GUI, you can efficiently extract, transmit, and process data between systems.
  • NiFi Template Gallery: You can share/distribute reference NiFi templates. Data Flow provides work files for data processing flows commonly used in the field as a gallery, and users can share the data processing flow work they have created.

Component

Data Flow consists of Manager and Service modules, and is provided packaged with Apache NiFi.

Data Flow Manager

Data Flow Manager provides various management functions to enable more efficient use of NiFi.

  • You can upload the Nar file created by the customer through the Data Flow Manager for use in the Processor, and upload configuration files to share them.
  • Frequently used NiFi templates are packaged as assets and offered in the Gallery, ready for use with a single click.
  • Provides real-time monitoring of multiple services configured for the native NiFi service, as well as resource status monitoring.
  • You can easily provision configuration information for NiFi components within the cluster.

Data Flow Service

  • We provide a data flow management service based on Apache NiFi.
  • Automatically configures the architecture required for scalable clustering based on Apache NiFi, and automatically installs the Nifi, ZooKeeper, and Nifi Registry modules.
  • When providing Nifi, you can set the Description, required resource size, connection ID/PW, and Host Alias.
  • After creating the service, you can modify the Description, required resource size, connection password, Host Alias, and other settings, and apply the changes to the service.

Server spec type

When creating a Data Flow service, check the following.

  • Recommended Service Installation Specifications: CPU 21 core, Memory 57 GB, Storage at least 100 GB
Reference
  • Before creating the Data Flow service, you need to install the Ingress Controller.
  • Only one Ingress Controller can be installed in a Kubernetes cluster.
  • For more details, refer to Ingress Controller Installation.

Provision status by region

Data Flow is available in the environments below.

regionProvision status
Korea West (kr-west1)Provided
Korea East (kr-east1)Provide
South Korea 1 (kr-south1)Not provided
South Korea South 2 (kr-south2)Not provided
South Korea 3 (kr-south3)Not provided
Table. Data Flow regional availability status

Preliminary Service

This is a list of services that must be pre‑configured before creating the service. Please refer to the guide provided for each service and prepare in advance.

Service CategoryserviceDetailed description
StorageFile StorageStorage that enables multiple client servers to share files over a network connection.
ContainerKubernetes EngineKubernetes container orchestration service
Table. Data Flow Pre-service

1 - ServiceWatch metric

In ServiceWatch, you can view Kubernetes Engine metrics for the Kubernetes Engine created by Data Flow. As with Kubernetes Engine, the metrics provided by default monitoring are data collected at one‑minute intervals.

Reference
Refer to the ServiceWatch guide for how to view metrics in ServiceWatch.

Basic Metrics

The following are the default metrics for the Kubernetes Engine namespace.

The metrics whose names are displayed in bold below are the key metrics selected from the default metrics provided by Kubernetes Engine. Key metrics are used to build service dashboards that are automatically created for each service in ServiceWatch.

Each metric provides guidance in the user guide on which statistical values are meaningful when querying that metric, and among the meaningful statistics, the values shown in bold are the primary statistics. In the service dashboard, you can view key metrics using primary statistical values.

Indicator nameDetailed descriptionunitmeaningful statistics
cluster_upCluster upCount
  • Total
  • Average
  • Maximum
  • Minimum
cluster_node_countNumber of cluster nodesCount
  • Total
  • Average
  • Max
  • Min
cluster_failed_node_countNumber of failed nodes in the clusterCount
  • Total
  • Average
  • Maximum
  • Minimum
cluster_namespace_phase_countNumber of cluster namespace phasesCount
  • Total
  • Average
  • Maximum
  • Minimum
cluster_pod_phase_countCluster pod phase countCount
  • Total
  • Average
  • Maximum
  • Minimum
node_cpu_allocatableNode CPU allocatable-
  • Total
  • Average
  • Maximum
  • Minimum
node_cpu_capacityNode CPU capacity-
  • Total
  • Average
  • Maximum
  • Minimum
node_cpu_usageNode CPU usage-
  • Total
  • Average
  • Maximum
  • Minimum
node_cpu_utilizationNode CPU usage-
  • Total
  • Average
  • Maximum
  • Minimum
node_memory_allocatableNode memory allocatable amountBytes
  • Total
  • Average
  • Maximum
  • Minimum
node_memory_capacityNode memory capacityBytes
  • Total
  • Average
  • Maximum
  • Minimum
node_memory_usageNode memory usageBytes
  • Total
  • Average
  • Maximum
  • Minimum
node_memory_utilizationNode memory usage rate-
  • Total
  • Average
  • Maximum
  • Minimum
node_network_rx_bytesNode network received bytesBytes/Second
  • Total
  • Average
  • Maximum
  • Minimum
node_network_tx_bytesNode network transmitted bytesBytes/Second
  • Total
  • Average
  • Maximum
  • Minimum
node_network_total_bytesTotal bytes of the node networkBytes/Second
  • Total
  • Average
  • Maximum
  • Minimum
node_number_of_running_podsNumber of pods running on a nodeCount
  • Total
  • Average
  • Maximum
  • Minimum
namespace_number_of_running_podsNumber of running pods in the namespaceCount
  • Total
  • Average
  • Maximum
  • Minimum
namespace_deployment_pod_countNamespace deployment pod countCount
  • Total
  • Average
  • Maximum
  • Minimum
namespace_statefulset_pod_countNamespace StatefulSet pod countCount
  • Total
  • Average
  • Maximum
  • Minimum
namespace_daemonset_pod_countNamespace daemonset pod countCount
  • Total
  • Average
  • Maximum
  • Minimum
namespace_job_active_countActive namespace job countCount
  • Total
  • Average
  • Maximum
  • Minimum
namespace_cronjob_active_countNumber of active namespace cronjobsCount
  • Total
  • Average
  • Maximum
  • Minimum
pod_cpu_usagePod CPU usage-
  • Total
  • Average
  • Maximum
  • Minimum
pod_memory_usagePod memory usageBytes
  • Total
  • Average
  • Maximum
  • Minimum
pod_network_rx_bytesPod network received bytesBytes/Second
  • Total
  • Average
  • Maximum
  • Minimum
pod_network_tx_bytesPod network transmitted bytesBytes/Second
  • Total
  • Average
  • Maximum
  • Minimum
pod_network_total_bytesPod network total bytesCount
  • Total
  • Average
  • Maximum
  • Minimum
container_cpu_usageContainer CPU usage-
  • Total
  • Average
  • Maximum
  • Minimum
container_cpu_limitContainer CPU limit-
  • Total
  • Average
  • Maximum
  • Minimum
container_cpu_utilizationContainer CPU usage-
  • Total
  • Average
  • Maximum
  • Minimum
container_memory_usageContainer memory usageBytes
  • Total
  • Average
  • Maximum
  • Minimum
container_memory_limitContainer memory limitBytes
  • Total
  • Average
  • Maximum
  • Minimum
container_memory_utilizationContainer memory usage-
  • Total
  • Average
  • Maximum
  • Minimum
node_gpu_countNode GPU countCount
  • Total
  • Average
  • Maximum
  • Minimum
gpu_tempGPU temperature-
  • Total
  • Average
  • Maximum
  • Minimum
gpu_power_usageGPU power usage-
  • Total
  • Average
  • Maximum
  • Minimum
gpu_utilGPU utilizationPercent
  • Total
  • Average
  • Maximum
  • Minimum
gpu_sm_clockGPU SM clock-
  • Total
  • Average
  • Maximum
  • Minimum
gpu_fb_usedGPU FB usageMegabytes
  • Total
  • Average
  • Maximum
  • Minimum
gpu_tensor_activeGPU tensor utilization-
  • Total
  • Average
  • Maximum
  • Minimum
pod_gpu_utilPod GPU utilizationPercent
  • Total
  • Average
  • Maximum
  • Minimum
pod_gpu_tensor_activePod GPU Tensor Utilization Rate-
  • Total
  • Average
  • Maximum
  • Minimum
Table. Kubernetes Engine Basic Metrics