The page has been translated by Gen AI.
Installing ServiceWatch Agent
Users can install ServiceWatch Agent on the GPU Node of Multi-node GPU Cluster to collect custom metrics and logs.
Note
Custom metrics/logs collection through ServiceWatch Agent is currently available only in Samsung Cloud Platform For Enterprise. It is planned to be provided in other offerings in the future.
Warning
Metrics collection through ServiceWatch Agent is classified as custom metrics and charges are applied unlike default collected metrics, so it is recommended to remove or disable unnecessary metric collection settings.
ServiceWatch Agent
The agents that need to be installed to collect ServiceWatch’s custom metrics and logs on the GPU Node of Multi-node GPU Cluster can be divided into two main types: Prometheus Exporter and Open Telemetry Collector.
| Item | Description | |
|---|---|---|
| Prometheus Exporter | Provides metrics of specific applications or services in a format that Prometheus can scrape
| |
| Open Telemetry Collector | Acts as a central collector that collects telemetry data such as metrics and logs from distributed systems, processes them (filtering, sampling, etc.), and sends them to multiple backends (e.g., Prometheus, Jaeger, Elasticsearch, etc.)
|
Table. Description of Prometheus Exporter and Open Telemetry Collector
Notice
If Kubernetes Engine is configured on the GPU Node, please check GPU metrics through the metrics provided by Kubernetes Engine.
- If DCGM Exporter is installed on a GPU Node where Kubernetes Engine is configured, it may not operate normally.
Note
The ServiceWatch Agent guide for GPU metric collection on GPU Nodes can be used in the same way as for GPU Server.
For details, see GPU Server > ServiceWatch Agent.
Pre-settings for Using ServiceWatch Agent
To use ServiceWatch Agent, please prepare pre-settings by referring to Pre-environment Setup for ServiceWatch Agent.