Overview

1: ServiceWatch metric

Service Overview

Data Flow is a data processing flow tool that visually creates processing flows for extracting large volumes of data from various data sources and for transforming/transferring stream or batch data, and provides the open-source Apache NiFi. Data Flow can be used independently in the Kubernetes Engine cluster environment of Samsung Cloud Platform, or together with other application software.

Provided Features

Data Flow provides the following features.

Convenient Installation and Management: Data Flow can be easily installed in a standard Kubernetes cluster environment via the web-based Samsung Cloud Platform Console. * Automatically configure the architecture required for scalable clustering based on the open-source Apache NiFi, and automatically install ZooKeeper, Registry, and management modules. * Through Data Flow, you can configure and deploy the configuration files, NiFi templates, etc., needed for service connections.
Easy Data Flow Management: You can easily create processing flows for stream/batch data in a GUI tailored to the user environment, and with GUI-based data processing flow creation, you can efficiently extract, transfer, and process data between systems.
NiFi Template Gallery: You can share/distribute reference NiFi templates. * Data Flow provides work files for data processing flows commonly used in the field as a gallery, and users can share the data processing flow tasks they have created.

Component

Data Flow consists of Manager and Service modules, and is provided packaged with Apache NiFi.

Data Flow Manager

Data Flow Manager provides various managing features to enable more efficient use of NiFi.

Through Data Flow Manager, you can upload the Nar File created by the customer for use in the Processor, and upload configuration files to share them.
Frequently used NiFi templates are packaged as assets and offered in the Gallery, ready for use with a single click.
Provides real-time monitoring of multiple services configured for the native NiFi service and resource status monitoring.
You can easily provision configuration information for NiFi components within the cluster.

Data Flow Service

We provide a data flow management service based on Apache NiFi.
Automatically configures the architecture required for scalable clustering based on Apache NiFi, and automatically installs the Nifi, ZooKeeper, and Nifi Registry modules.
When providing Nifi, you can set the Description, required resource size, connection ID/PW, and Host Alias.
After creating the service, you can modify the Description, required resource size, connection password, Host Alias, and other settings, and apply the changes to the service.

Server spec type

When creating a Data Flow service, check the following.

Recommended service installation specifications: CPU 21 core, Memory 57 GB, storage 100 GB or more

Provision status by region

Data Flow is available in the environments below.

Region	Provision status
Korea West (kr-west1)	Provide
Korea East (kr-east1)	Provide
South Korea South 1 (kr-south1)	Not provided
South Korea South 2 (kr-south2)	Not provided
South Korea South 3 (kr-south3)	Not provided

Table. Data Flow regional availability status

Preliminary Service

This is a list of services that must be pre-configured before creating the service. Please refer to the guide provided for each service and prepare in advance.

Service Category	Service	Details
Storage	File Storage	Storage where multiple client servers share files via a network connection
Container	Kubernetes Engine	Kubernetes container orchestration service

Table. Data Flow pre-service

1 - ServiceWatch metric

In ServiceWatch, you can view Kubernetes Engine metrics for the Kubernetes Engine created by Data Flow. As with Kubernetes Engine, the metrics provided by default monitoring are data collected at one‑minute intervals.

Reference

Refer to the ServiceWatch guide for how to view metrics in ServiceWatch.

Basic Metrics

The following are the default metrics for the Kubernetes Engine namespace.

The metrics whose names are displayed in bold below are the key metrics selected from the default metrics provided by Kubernetes Engine. Key metrics are used to build service dashboards that are automatically created for each service in ServiceWatch.

Each metric provides guidance in the user guide on which statistical values are meaningful when querying that metric, and among the meaningful statistics, the values shown in bold are the primary statistics. In the service dashboard, you can view key metrics using primary statistical values.

Indicator name	Detailed description	unit	meaningful statistics
cluster_up	Cluster up	Count	Total Average Maximum Minimum
cluster_node_count	Number of cluster nodes	Count	Total Average Max Min
cluster_failed_node_count	Number of failed nodes in the cluster	Count	Total Average Maximum Minimum
cluster_namespace_phase_count	Number of cluster namespace phases	Count	Total Average Maximum Minimum
cluster_pod_phase_count	Cluster pod phase count	Count	Total Average Maximum Minimum
node_cpu_allocatable	Node CPU allocatable	-	Total Average Maximum Minimum
node_cpu_capacity	Node CPU capacity	-	Total Average Maximum Minimum
node_cpu_usage	Node CPU usage	-	Total Average Maximum Minimum
node_cpu_utilization	Node CPU usage	-	Total Average Maximum Minimum
node_memory_allocatable	Node memory allocatable amount	Bytes	Total Average Maximum Minimum
node_memory_capacity	Node memory capacity	Bytes	Total Average Maximum Minimum
node_memory_usage	Node memory usage	Bytes	Total Average Maximum Minimum
node_memory_utilization	Node memory usage rate	-	Total Average Maximum Minimum
node_network_rx_bytes	Node network received bytes	Bytes/Second	Total Average Maximum Minimum
node_network_tx_bytes	Node network transmitted bytes	Bytes/Second	Total Average Maximum Minimum
node_network_total_bytes	Total bytes of the node network	Bytes/Second	Total Average Maximum Minimum
node_number_of_running_pods	Number of pods running on a node	Count	Total Average Maximum Minimum
namespace_number_of_running_pods	Number of running pods in the namespace	Count	Total Average Maximum Minimum
namespace_deployment_pod_count	Namespace deployment pod count	Count	Total Average Maximum Minimum
namespace_statefulset_pod_count	Namespace StatefulSet pod count	Count	Total Average Maximum Minimum
namespace_daemonset_pod_count	Namespace daemonset pod count	Count	Total Average Maximum Minimum
namespace_job_active_count	Active namespace job count	Count	Total Average Maximum Minimum
namespace_cronjob_active_count	Number of active namespace cronjobs	Count	Total Average Maximum Minimum
pod_cpu_usage	Pod CPU usage	-	Total Average Maximum Minimum
pod_memory_usage	Pod memory usage	Bytes	Total Average Maximum Minimum
pod_network_rx_bytes	Pod network received bytes	Bytes/Second	Total Average Maximum Minimum
pod_network_tx_bytes	Pod network transmitted bytes	Bytes/Second	Total Average Maximum Minimum
pod_network_total_bytes	Pod network total bytes	Count	Total Average Maximum Minimum
container_cpu_usage	Container CPU usage	-	Total Average Maximum Minimum
container_cpu_limit	Container CPU limit	-	Total Average Maximum Minimum
container_cpu_utilization	Container CPU usage	-	Total Average Maximum Minimum
container_memory_usage	Container memory usage	Bytes	Total Average Maximum Minimum
container_memory_limit	Container memory limit	Bytes	Total Average Maximum Minimum
container_memory_utilization	Container memory usage	-	Total Average Maximum Minimum
node_gpu_count	Node GPU count	Count	Total Average Maximum Minimum
gpu_temp	GPU temperature	-	Total Average Maximum Minimum
gpu_power_usage	GPU power usage	-	Total Average Maximum Minimum
gpu_util	GPU utilization	Percent	Total Average Maximum Minimum
gpu_sm_clock	GPU SM clock	-	Total Average Maximum Minimum
gpu_fb_used	GPU FB usage	Megabytes	Total Average Maximum Minimum
gpu_tensor_active	GPU tensor utilization	-	Total Average Maximum Minimum
pod_gpu_util	Pod GPU utilization	Percent	Total Average Maximum Minimum
pod_gpu_tensor_active	Pod GPU Tensor Utilization Rate	-	Total Average Maximum Minimum

Table. Kubernetes Engine Basic Metrics