The page has been translated by Gen AI.
Install ServiceWatch Agent
Users can install the ServiceWatch Agent on GPU nodes of a Multi-node GPU Cluster to collect custom metrics and logs.
Reference
Collecting custom metrics/logs via the ServiceWatch Agent is currently available only on Samsung Cloud Platform For Enterprise. It will also be available in other offerings in the future.
Caution
Since metric collection through the ServiceWatch Agent is classified as custom metrics and incurs charges unlike the default collected metrics, it is recommended to remove or disable unnecessary metric collection settings.
ServiceWatch Agent
In a Multi-node GPU Cluster, the agents that need to be installed on GPU nodes to collect ServiceWatch custom metrics and logs can be divided into two main types. It is a Prometheus Exporter and Open Telemetry Collector.
| Category | Detailed description | |
|---|---|---|
| Prometheus Exporter | Provide metrics of a specific application or service in a format that Prometheus can scrape
| |
| Open Telemetry Collector | Acts as a centralized collector that gathers telemetry data such as metrics and logs from distributed systems, processes (filtering, sampling, etc.) it, and exports it to multiple backends (e.g., Prometheus, Jaeger, Elasticsearch, etc.)
|
Table. Explanation of Prometheus Exporter and Open Telemetry Collector
information
If you have configured a Kubernetes Engine on a GPU node, please view the GPU metrics using the metrics provided by the Kubernetes Engine.
- If you install the DCGM Exporter on a GPU node where Kubernetes Engine is configured, it may not operate correctly.
Reference
The ServiceWatch Agent guide for collecting GPU metrics on a GPU Node can be used the same as on a GPU Server.
For more details, refer to GPU Server > ServiceWatch Agent.
Pre-configuration for Using ServiceWatch Agent
To use the ServiceWatch Agent, please refer to Prerequisite Settings for ServiceWatch Agent and prepare the prerequisite settings.