The page has been translated by Gen AI.

Install ServiceWatch Agent

Users can install the ServiceWatch Agent on GPU nodes of a Multi-node GPU Cluster to collect custom metrics and logs.

Reference
Collecting custom metrics/logs via the ServiceWatch Agent is currently available only on Samsung Cloud Platform For Enterprise. It will also be available in other offerings in the future.
Caution
Since metric collection through the ServiceWatch Agent is classified as custom metrics and incurs charges unlike the default collected metrics, it is recommended to remove or disable unnecessary metric collection settings.

ServiceWatch Agent

In a Multi-node GPU Cluster, the agents that need to be installed on GPU nodes to collect ServiceWatch custom metrics and logs can be divided into two main types. It is a Prometheus Exporter and Open Telemetry Collector.

CategoryDetailed description
Prometheus ExporterProvide metrics of a specific application or service in a format that Prometheus can scrape
  • For collecting OS metrics on a GPU Node, you can use the Node Exporter for Linux servers and the Windows Exporter for Windows servers, depending on the OS type.
Open Telemetry CollectorActs as a centralized collector that gathers telemetry data such as metrics and logs from distributed systems, processes (filtering, sampling, etc.) it, and exports it to multiple backends (e.g., Prometheus, Jaeger, Elasticsearch, etc.)
  • Exports data to the ServiceWatch Gateway so that ServiceWatch can collect metric and log data.
Table. Explanation of Prometheus Exporter and Open Telemetry Collector
information

If you have configured a Kubernetes Engine on a GPU node, please view the GPU metrics using the metrics provided by the Kubernetes Engine.

  • If you install the DCGM Exporter on a GPU node where Kubernetes Engine is configured, it may not operate correctly.
Reference
The ServiceWatch Agent guide for collecting GPU metrics on a GPU Node can be used the same as on a GPU Server. For more details, refer to GPU Server > ServiceWatch Agent.

Pre-configuration for Using ServiceWatch Agent

To use the ServiceWatch Agent, please refer to Prerequisite Settings for ServiceWatch Agent and prepare the prerequisite settings.

Manage Cluster Fabric
Multi-node GPU Cluster Service Scope and Inspection Guide