Indicator
Indicator
Metrics are data about system performance. By default, many services provide free metrics for resources (e.g., Virtual Server, File Storage, etc.), which are provided as basic monitoring through ServiceWatch. Detailed monitoring can be used for some resources such as Virtual Server.
Indicator data is retained for 15 months (455 days), so you can view both the latest data and historical data.
| Term | Example | Description | |
|---|---|---|---|
| Namespace | Virtual Server | Logical division for distinguishing and grouping metrics
| |
| Metric | CPU usage | Name of the specific data to be collected
| |
| Dimension(Dimensions) | resource_id | Unique identifier for the metric
| |
| Collection Interval | 5 minutes | Collection interval of metric data from each service providing metrics
| |
| Statistics | Average | How to aggregate metric data over a specified period
| |
| Unit | % | Statistical measurement unit
| |
| Aggregation Period | 5 minutes | Period for aggregating collected metric data
| |
| Alert | CPU usage >= 80% | Occurs for 5 minutes | If CPU usage remains at 80% or higher for 5 minutes, change to Alert state |
Namespace
A namespace is a logical division used to distinguish and group ServiceWatch metrics. Samsung Cloud Platform service namespaces are mostly used the same as the service name, and can be found in the ServiceWatch Integrated Service List.
For custom metrics, users can define a namespace that distinguishes them from other metrics in ServiceWatch, and can define it via ServiceWatch Agent settings or OpenAPI. Detailed information about custom metrics and logs can be found in Custom Metrics and Logs.
Metric (Metric)
A metric represents a set of data points collected in ServiceWatch, sorted chronologically. A data point consists of a timestamp, the collected data, and the unit of the data.
For example, the CPU usage of a specific Virtual Server is one of the basic monitoring metrics provided by Virtual Server. The data point itself can occur in any application or activity that collects data.
Basically, the Samsung Cloud Platform service linked with ServiceWatch provides metrics for resources for free. Detailed monitoring for some resources is provided as a paid service and can be enabled in each service.
Metrics can only be viewed in the region where they were created. Metrics cannot be arbitrarily deleted by users. However, if new data is not posted to ServiceWatch, they will automatically expire after 15 months. Data points older than 15 months (455 days) expire sequentially, and when new data points are added, data older than 15 months (455 days) is deleted.
Timestamp
The timestamp of a data point is time information indicating the time at which the data point was recorded. Each metric data point consists of a timestamp (time) and data.
The timestamp consists of hours, minutes, seconds, and date.
Metric Retention Period
ServiceWatch metric data is maintained as follows.
- Data points with a collection interval set to 60 seconds (1 minute) can be used until the 15th
- Data points with a collection interval set to 300 seconds (5 minutes) are usable up to day 63
- Data points with a collection interval set to 3600 seconds (1 hour) are usable up to 455 days (15 months).
The data points that were initially collected at a short collection interval are downsampled and stored for long-term retention.
For example, if data is collected at a 1‑minute interval, it is retained in 1‑minute granularity for 15 days. After 15 days, the data continues to be retained, but can only be queried in 5‑minute intervals. After 63 days, the data is re‑aggregated and provided in 1‑hour intervals. If you need to retain metric data points longer than the metric retention period, you can store them separately via the File Download or Export to Object Storage functions.
Dimensions(Dimensions)
Key-value pair that serves as a unique identifier for the metric, allowing classification and filtering of data points.
For example, you can identify metrics for a specific server by using the resource_id dimension of the Virtual Server’s metrics.
Collection Cycle
It refers to the cycle of collecting data points for each service’s metrics, and is provided at the collection cycle predefined by each service.
Refer to each service’s ServiceWatch metrics page for the metric collection interval of each service.
For example, Virtual Server provides a collection interval of 5 minutes during basic monitoring, and provides 1 minute when detailed monitoring is enabled.
Statistics
Statistics is a method of aggregating metric data over a specified period. ServiceWatch provides data aggregated as statistics based on metric data points provided to ServiceWatch from each service. Aggregation is performed using namespace, metric name, dimensions, and data point units within the specified aggregation period.
The provided statistics are total, average, minimum, maximum.
- Total: Sum of all data point values collected during the period
- Average: During the specified period, (sum of all data pointer values during that period)/(number of data pointers during that period) value
- Minimum: the lowest value observed during the specified period
- Maximum: The highest value observed during the specified period
Unit
Each statistic has a measurement unit. Examples of units include Bytes, Second, Count, Percent, etc.
Aggregation Period
Each statistic calculates the data points of the metric collected during the selected aggregation period. The aggregation period can be chosen from 1 minute, 5 minutes, 15 minutes, 30 minutes, 1 hour, 3 hours, 6 hours, 12 hours, or 1 day, and the default is 5 minutes. The aggregation period is closely related to the collection interval of metric data points, and to obtain correct aggregation results, the aggregation period must be longer than or equal to the collection interval.
For example, if you select average for the statistic, choose an aggregation period of 5 minutes, and select a metric with a collection interval of 1 minute, data points are collected at 1‑minute intervals and the average is calculated over the data points collected during the 5‑minute period. Conversely, if the aggregation period is shorter than the collection interval, it means that a normal aggregation result cannot be obtained.
Downsampling is applied for long-term storage of metric data. For example, if data is collected at a 1‑minute interval, after 15 days this data can only be queried in 5‑minute increments. If you set the aggregation period for such metrics from 5 minutes to 30 minutes, up to 5 minutes may be required to retrieve the downsampled data correctly. After 63 days, the data is re‑aggregated and provided in 1‑hour intervals. At that point, selecting an aggregation period from 1 hour to 1 day may take up to 1 hour to retrieve the data correctly. This is because aggregating the metric data after downsampling takes time, which can cause aggregation delays.
| Aggregation Period | Aggregation Delay |
|---|---|
| 1 minute | - |
| 5 minutes | up to 5 minutes |
| 15 minutes | up to 5 minutes |
| 30 minutes | maximum 5 minutes |
| 1 hour | maximum 1 hour |
| 3 hours | maximum 1 hour |
| 6 hours | up to 1 hour |
| 12 hours | up to 1 hour |
| 1 day | up to 1 hour |
Alert
When creating an alert policy, you can evaluate a single metric over the entered evaluation range, and if it meets the condition set based on the threshold, you can provide the user with an alert notification.
The alert status is classified as Alert(Alert), Normal(Normal), Insufficient data(No data).
- Alert(Alert): when the indicator meets the set conditions
- Normal(Normal): when the indicator does not meet the set conditions
- Insufficient data(no data): when the metric data does not exist, is missing, or has not yet arrived
When the alarm status is Alert, if the alarm evaluation deviates from the condition, the alarm status changes back to Normal.
For detailed information about alerts, please refer to the Alert item.
Basic Monitoring and Detailed Monitoring
ServiceWatch provides two types of monitoring: basic monitoring and detailed monitoring.
The Samsung Cloud Platform service integrated with ServiceWatch provides basic monitoring by publishing a basic set of metrics to ServiceWatch for free. By default, if you use even one of these services, basic monitoring is automatically enabled and can be viewed in ServiceWatch.
Detailed monitoring is only available for some services and incurs charges. To use detailed monitoring, you must enable it in the service details.
Detailed monitoring options vary depending on the service provided.
- The default monitoring of Virtual Server has a collection interval of 5 minutes. When detailed monitoring is enabled, the metrics provided by default monitoring are collected at a 1‑minute interval instead of 5 minutes.
The following includes services and guides that provide detailed monitoring.
| Service | Guide |
|---|---|
| Virtual Server/GPU Server | Virtual Server Enable Detailed Monitoring |