The page has been translated by Gen AI.
ServiceWatch Agent Install
Users can install the ServiceWatch Agent on the GPU node of a Multi-node GPU Cluster to collect custom metrics and logs.
Reference
Collecting custom metrics/logs via ServiceWatch Agent is currently only available on Samsung Cloud Platform For Enterprise. It will be offered in other offerings in the future.
Caution
Since metric collection via ServiceWatch Agent is classified as custom metrics and incurs charges unlike the default collected metrics, it is recommended to remove or disable unnecessary metric collection settings.
ServiceWatch Agent
The agents that need to be installed on the GPU nodes of a multi-node GPU cluster for collecting ServiceWatch custom metrics and logs can be broadly divided into two types. This is Prometheus Exporter and Open Telemetry Collector.
| Category | Detailed description | |
|---|---|---|
| Prometheus Exporter | Provides metrics of a specific application or service in a format that Prometheus can scrape
| |
| Open Telemetry Collector | Acts as a centralized collector that gathers telemetry data such as metrics and logs from distributed systems, processes (filtering, sampling, etc.) them, and then exports to various backends (e.g., Prometheus, Jaeger, Elasticsearch, etc.)
|
Table. Description of Prometheus Exporter and Open Telemetry Collector
Notice
If you have configured Kubernetes Engine on a GPU Node, please check GPU metrics through the metrics provided by Kubernetes Engine.
- If you install the DCGM Exporter on a GPU node where Kubernetes Engine is configured, it may not work properly.
Reference
The ServiceWatch Agent guide for collecting GPU metrics on a GPU Node can be used the same as on a GPU Server.
For more details, see GPU Server > ServiceWatch Agent.