The page has been translated by Gen AI.

Installing ServiceWatch Agent

Users can install ServiceWatch Agent on the GPU Node of Multi-node GPU Cluster to collect custom metrics and logs.

Note
Custom metrics/logs collection through ServiceWatch Agent is currently available only in Samsung Cloud Platform For Enterprise. It is planned to be provided in other offerings in the future.
Warning
Metrics collection through ServiceWatch Agent is classified as custom metrics and charges are applied unlike default collected metrics, so it is recommended to remove or disable unnecessary metric collection settings.

ServiceWatch Agent

The agents that need to be installed to collect ServiceWatch’s custom metrics and logs on the GPU Node of Multi-node GPU Cluster can be divided into two main types: Prometheus Exporter and Open Telemetry Collector.

ItemDescription
Prometheus ExporterProvides metrics of specific applications or services in a format that Prometheus can scrape
  • For OS metric collection on GPU Nodes, you can use Node Exporter for Linux servers and Windows Exporter for Windows servers depending on the OS type.
Open Telemetry CollectorActs as a central collector that collects telemetry data such as metrics and logs from distributed systems, processes them (filtering, sampling, etc.), and sends them to multiple backends (e.g., Prometheus, Jaeger, Elasticsearch, etc.)
  • Enables ServiceWatch to collect metrics and log data by sending data to ServiceWatch Gateway.
Table. Description of Prometheus Exporter and Open Telemetry Collector
Notice

If Kubernetes Engine is configured on the GPU Node, please check GPU metrics through the metrics provided by Kubernetes Engine.

  • If DCGM Exporter is installed on a GPU Node where Kubernetes Engine is configured, it may not operate normally.
Note
The ServiceWatch Agent guide for GPU metric collection on GPU Nodes can be used in the same way as for GPU Server. For details, see GPU Server > ServiceWatch Agent.

Pre-settings for Using ServiceWatch Agent

To use ServiceWatch Agent, please prepare pre-settings by referring to Pre-environment Setup for ServiceWatch Agent.

Cluster Fabric Management
Multi-node GPU Cluster Service Scope and Inspection Guide