This is the multi-page printable view of this section. Click here to print.
Cloud Monitoring
- 1: Overview
- 2: How-to guides
- 2.1: Using Monitoring Dashboards
- 2.2: Performance Analysis
- 2.3: Log Analysis
- 2.4: Managing Events
- 2.5: Using Custom Dashboards
- 2.6: Managing Agents
- 2.7: Appendix A. Service-specific Monitoring Targets
- 2.8: Appendix B. Service-specific Performance Metrics
- 2.9: Appendix C. Service-specific Status Checks
- 3: API Reference
- 4: Release Note
1 - Overview
According to Samsung Cloud Platform’s policy, the Cloud Monitoring service is scheduled to be discontinued.
Accordingly, after the September 2026 release, resource monitoring of the Samsung Cloud Platform via Cloud Monitoring will no longer be possible.
With the new alternative service, you can continuously perform resource monitoring by using ServiceWatch, released in October 2025.
ServiceWatch provides more modern and powerful features, replacing Cloud Monitoring to deliver a smooth monitoring environment.
Detailed information about ServiceWatch is in the ServiceWatch Overview. Please refer to it.
※ For some Database and Data Analytics services, refer to the user guide of the respective service for the service watch implementation schedule.
| service | User Guide |
|---|---|
| EPAS(DBaaS) | EPAS(DBaaS) > How-to guides |
| PostgreSQL(DBaaS) | PostgreSQL(DBaaS) > How-to guides |
| MariaDB(DBaaS) | MariaDB(DBaaS) > How-to guides |
| MySQL(DBaaS) | MySQL(DBaaS) > How-to guides |
| Microsoft SQL Server(DBaaS) | Microsoft SQL Server(DBaaS) > How-to guides |
| CacheStore(DBaaS) | CacheStore(DBaaS) > How-to guides |
| Event Streams | Event Streams > How-to guides |
| Search Engine | Search Engine > How-to guides |
| Vertica(DBaaS) | Vertica(DBaa) > How-to guides |
Service Overview
The Cloud Monitoring service collects usage status, change information, and logs of operational infrastructure resources, and generates an event to notify when a configured threshold is exceeded. Through this, users can respond quickly to performance degradation and failures, and can conveniently develop resource capacity expansion plans to configure a stable computing environment.
Provided Features
Cloud Monitoring provides the following features.
- Stable computing resource management: You can easily view metrics such as CPU usage, disk usage, and memory usage. When an event occurs in the resources being used, an automatic notification is sent to the designated recipients, enabling rapid fault analysis and response, so computing resources can be operated reliably.
- Convenient Monitoring: Status information about resources can be easily monitored by creating a dashboard. * Provides default and custom dashboards, enabling you to configure various widget types and easily and quickly create dashboards yourself.
- Event Metric Management: Through the web-based Console, you can easily set event metrics with just a few clicks. The event metric settings for the monitoring target (such as event patterns, trigger conditions, occurrence frequency, performance metrics, operational status, etc.) can be varied to suit the usage environment, and threshold and alarm configurations can be managed conveniently.
- Resource Log Management: Collects and stores log data of resources, and allows searching the target logs when needed. Additionally, we metricize events for major keywords and automatically notify the designated person when pre‑set conditions are met, providing a more stable usage environment.
Component
Dashboard
In the monitoring dashboard, you can view the operational status and event status of monitored services and resources, as well as the top usage items.
| Item | Explanation |
|---|---|
| Region | Resource location |
| Data reference time | Reference time of the data displayed on the dashboard |
| Refresh | Refresh the dashboard based on the current time |
| Period setting | Set data query period and refresh interval |
| Monitoring status | Number and status of monitoring targets for each service used in the Account |
| Event History | Display events that occurred in the past 7 days as a graph by risk level. |
| Top 5 usage rates by performance | Display the top five monitoring targets with the highest usage for each major performance metric |
| Event map | Display the number of events per service by severity |
| Event status | Display the list of unprocessed events among the occurred events |
Performance Analysis
In performance analysis, you can identify the main performance metrics of the monitoring target and view the current data and historical records within the period for each metric. Users can view the performance status of the monitoring targets they manage by service or by period, and compare specific performance metrics to analyze the results.
Log Analysis
In log analysis, you collect the logs of the monitoring target, examine their contents, and convert them into metrics—structured data—for monitoring. Each monitoring target provides a default collection log, and users can create custom logs to collect and view additional logs as needed.
Event Management
An event is a configuration that notifies the user when a monitoring target’s performance value meets a specific condition. By configuring events, you can capture essential monitoring information that users need to know without missing it. For example, if you configure events to trigger whenever a performance metric related to overload exceeds a certain threshold, users will receive notifications each time there is a risk of overload during resource operation. Users can proactively respond before problems arise based on this. In event management, you can create such events and configure them to notify designated users whenever a specific value occurs during monitoring.
Preceding Service
Cloud Monitoring has no prerequisite services.
2 - How-to guides
According to Samsung Cloud Platform’s policy, the Cloud Monitoring service is scheduled to be discontinued.
Accordingly, starting after the September 2026 release, monitoring of Samsung Cloud Platform resources through Cloud Monitoring will no longer be possible.
With a new alternative service, you can continuously perform resource monitoring by leveraging ServiceWatch released in October 2025.
ServiceWatch provides more modern and powerful features, replacing Cloud Monitoring to deliver a seamless monitoring environment.
Detailed information about ServiceWatch can be found in the ServiceWatch Overview.
※ For some Database and Data Analytics services, refer to the user guide of the respective service for the service watch implementation schedule.
| service | User Guide |
|---|---|
| EPAS(DBaaS) | EPAS(DBaaS) > How-to guides |
| PostgreSQL(DBaaS) | PostgreSQL(DBaaS) > How-to guides |
| MariaDB(DBaaS) | MariaDB(DBaaS) > How-to guides |
| MySQL(DBaaS) | MySQL(DBaaS) > How-to guides |
| Microsoft SQL Server(DBaaS) | Microsoft SQL Server(DBaaS) > How-to guides |
| CacheStore(DBaaS) | CacheStore(DBaaS) > How-to guides |
| Event Streams | Event Streams > How-to guides |
| Search Engine | Search Engine > How-to guides |
| Vertica(DBaaS) | Vertica(DBaa) > How-to guides |
Samsung Cloud Platform Monitoring is a resource management system that can monitor and analyze the resource operation status within an account operated in the Samsung Cloud Platform Console. Users can efficiently manage resources by using the dashboard page, widgets, and chart features.
- The user can monitor resources created on an Account with permissions in the Samsung Cloud Platform Console.
- The user can log in to the Samsung Cloud Platform Console and navigate to Samsung Cloud Platform Monitoring to monitor.
Cloud Monitoring Getting Started
To start Samsung Cloud Platform Monitoring, follow these steps.
- All Services > Management > Cloud Monitoring Click the menu. 1. Navigate to the Service Home page of Cloud Monitoring.
- On the Service Home page, click the Open Cloud Monitoring button. 2. Go to the Cloud Monitoring Console page.
Explore Cloud Monitoring Console
The top and left menus of the Cloud Monitoring Console are organized as follows.
| Category | Detailed description |
|---|---|
| Custom Dashboard Management | Custom Dashboard
|
| Support | Support
|
| Region List | Region list
|
| User Information | You can view user information and log out from Samsung Cloud Platform Monitoring. |
| Side menu | Displays the main features of Samsung Cloud Platform Monitoring. Clicking each menu takes you to the corresponding page.
|
Stop Monitoring
To exit the Cloud Monitoring Console, click the User Info > Logout button at the top right.
Using Common Features
This explains the frequently used features when using the Cloud Monitoring Console.
View detailed information of the monitoring target
If you access Cloud Monitoring Console > Performance Analysis or Cloud Monitoring Console > Log Analysis > Log Overview, you can view the list of monitoring targets. At this point, to view detailed information for a monitoring target, click the desired target in the monitoring target list.
- Detailed information of the monitoring target varies depending on the service type.
- If the operating system (OS information) of the monitoring target is RHCOS (Redhat Core OS), detailed information about the monitoring target is not provided.
| Item | Explanation |
|---|---|
| Basic Information | Display basic information about the monitoring target
|
| Performance | Display the primary performance of the monitoring target in a graph |
| log | Show the log collection volume for the monitoring target in a graph. |
| event | Display the list of events that occurred in the monitoring target. |
| Agent | Provides the agent’s install, start, stop, delete, update commands |
| Set query period | Displays the reference date/time for data retrieval
|
| Monitoring status area | Displays performance, log, and event monitoring status. |
- The services that provide agent management commands are Virtual Server, GPU Server, and Bare Metal Server.
- For detailed information on installing and managing the agent, see Agent Management.
Sorting data
You can organize and view information such as event monitoring, performance, and log analysis results in descending or ascending order. To sort the data, follow these steps.
- Display the information to be verified on the page.
- Click the Sort button next to the Category name. 2. Each click toggles the sorting order between descending and ascending.
Check real-time data
You can configure the dashboard or detail page data to automatically refresh at a set interval.
- In the Cloud Monitoring Console, you can configure whether to enable refresh and set the refresh interval so that the monitoring page refreshes periodically.
- Click the refresh button to manually refresh based on the current time.
To set the data refresh interval, follow these steps.
- Click the Settings button at the top right of the data display area.
- After selecting the refresh interval, click the Confirm button.
- You can turn the refresh feature on or off.
Configure the query period
By setting the query period, you can limit the query scope to the specified range of performance, logs, and events, making it easy to find only the information you need. To set the query period, follow these steps.
- Click the Settings button in the upper right of the data display area.
- Select a date range or enter it manually.
- If you manually enter the query period, you must set the period to at least 30 minutes.
- If each widget’s data query range is fixed, the widget’s query range takes precedence.
2.1 - Using Monitoring Dashboards
In the monitoring dashboard, you can view the operational status and event status of monitored services and resources, as well as the top usage items.
Getting Started with Monitoring Dashboard
When you navigate from the Samsung Cloud Platform Console to the Cloud Monitoring Console page, the monitoring dashboard is displayed. If you are on a different page, click Cloud Monitoring Console > Monitoring Dashboard to go to the Monitoring Dashboard page.
The monitoring dashboard is structured as follows.
| Item | description |
|---|---|
| Data reference time | Display the reference time for the data shown on the dashboard |
| Refresh | Refresh the dashboard based on the current time |
| Automatic refresh | You can enable or disable the dashboard refresh feature. |
| Period setting | Set the data retrieval period or change the refresh interval |
| Monitoring Status | Display the number of monitoring targets and monitoring status for each service |
| Event History | Display the number of events that occurred in the last 7 days as a graph by severity. |
| Top 5 usage rates by performance | Display the usage rates of the five monitoring targets with the highest usage for each major performance metric as a usage graph. |
| Event map | Display the number of events per service by severity |
| Event status | Display the list of unprocessed events among the occurred events. |
- The monitoring dashboard is automatically created when an Account is created in the Samsung Cloud Platform Console and cannot be deleted arbitrarily.
- Configuration widgets on the monitoring dashboard cannot be modified arbitrarily.
- To create a dashboard with a specific widget, use a custom dashboard. For more information about custom dashboards, refer to Using Custom Dashboards.
Explore Common Dashboard Features
This describes the functions available on the dashboard.
Download widget image
Click the download button at the top right of the widget area to download the widget as an image file (*.png).
View graph details
When you place the mouse cursor over the graph, detailed information appears as a popup.
Monitoring Status
Shows the number of monitoring targets and their monitoring status for each service in use.
| Item | description |
|---|---|
| Service Category | Display the monitoring target service categories per service and the quantity of monitoring targets included in each service category
|
| Service List | Display the list and quantity of services included in the monitoring target service category
|
| Monitoring status | Displays the number of monitoring targets and their current status
|
| Event status | Displays the number of current events by grade (Fetal, Warning, Inform). |
- Performance collection in monitoring mode aggregates and displays the number of performance metrics from both Agent and Agentless approaches.
Event History
Displays the number of events that occurred in the last 7 days as a graph by severity.
When you place the mouse cursor over the graph, a popup shows the number of occurrences of events corresponding to the selected date’s event risk level, along with active/inactive information.
- Occurrence: total number of event occurrences
- Activation: The state where an event that has occurred by meeting the event trigger conditions continues to be maintained.
- Deactivation: The event that occurred no longer meets the event trigger conditions and has returned to a normal state
You can click the risk legend area to hide or unhide the corresponding graph.
Top 5 Usage by Performance
Displays a usage graph for the five monitoring targets with the highest utilization rates across major performance categories.
- When you place the mouse cursor over the graph, a popup displays the full name of the selected item and its current performance metrics.
- When you click the graph, a Monitoring Target Details popup window for the corresponding item opens.
Item description CPU Usage/Core [Basic] Percentage of CPU time used, excluding Idle and IOWait states Memory Used [Basic] Current memory usage Disk Read Bytes [Basic] Disk read byte count Disk Write Bytes [Basic] Disk write byte count
- The monitoring dashboard only displays the performance of Virtual Server. To show the Top 5 performance of other service types, you need to select and configure them in a custom dashboard.
Event Map
Displays the number of events per service by severity.
- When you place the mouse cursor over the rectangle, the name of the monitoring target appears as a popup.
- When you click a service item on the event map, the Monitoring Target Details popup window opens.
The risk level for each item is as follows.
| Item | description |
|---|---|
| No Rule | The condition cannot be classified as normal or abnormal. This indicates that the status cannot be assessed due to the absence of a threshold setting. |
| NORMAL | It is in a normal state. This means the threshold did not meet the configured value, so no event was generated. |
| INFORM | This is the lowest level of risk status, including information at a simple notification level. |
| WARNING | It is a moderate risk condition. |
| FATAL | This is the most dangerous stage. |
Event Status
Displays a list of events that are in an active state among the generated events.
- Events are displayed in order of most recent occurrence.
2.2 - Performance Analysis
In performance analysis, you can view the key performance metrics of the monitoring target and check both the current data and historical data within the period for each metric. Users can view the performance status of the monitored targets they manage by service or by period, and compare specific performance metrics to analyze the results.
Getting Started with Performance Analysis
You can start performance analysis by directly selecting the monitoring target or entering search criteria. To search for the monitoring target and analyze performance, follow these steps.
- Click Cloud Monitoring Console > Performance Analysis. You will be taken to the Performance Analysis page.
- After entering the search criteria for the monitoring target to be analyzed in the search area, click Search.
Item description Search area The detailed search filters displayed in the search area vary according to the service type - To perform Detailed Search, click the Detailed Search button.
- Each detailed search filter condition can be selected with one or more items
Number of monitoring targets displayed Display the number of performance items that can be viewed at once in the search results and list - The default number of performance items shown in the list is 20 per page.
- Change the list display count to 10, 20, 30, 40, 50, or 100 per page.
Search information Display search result values for the search criteria items - Monitoring target, service status, event grade
- Clicking the risk icon displayed for event risk opens a detailed popup of the most recent event corresponding to that risk level.
Performance metrics Information Displays key performance indicators according to the service type of the monitoring target - The list of key performance indicators per service refers to the service-specific key performance indicators and the collected information by instance type and status of the DB service
View Details View detailed information of the relevant monitoring target Performance Comparison Select a monitoring target and compare performance Table. Performance analysis
Check detailed performance information
To view detailed performance information of the monitoring target, follow these steps.
- Click the monitoring target for which you want to view detailed information in the performance analysis list. Monitoring Details popup window opens.
- Click the Performance tab.
- When you place the mouse cursor over the graph, the values of each performance metric appear in a popup window.
- Click the icon in the upper right corner to set the query period or change the refresh interval.
- You can click the Details, Summary buttons located at the top left of the performance chart to select the graph display method.
Item description Basic Information Display basic information about the monitoring target Details Performance charts of the monitoring target are expanded and displayed - View a single chart in detail
Summary Performance charts of the monitoring target are displayed in a grid layout - View multiple charts at a glance
Set query period - Date/Time: Displays the reference date and time for data retrieval.
- Refresh: Manually refresh to the current time.
- Start/Stop: Turns the automatic refresh feature off or on.
- Settings: Set the data query period or change the automatic refresh interval
Performance Comparison Generate a chart that compares the performance of monitoring targets, allowing each performance to be compared. Performance chart Performance charts of the monitoring target are displayed as graphs - When there is a single graph, the most recent collected value is shown in the upper right corner with its unit.
- When multiple graphs are present, an ⓘ appears in the upper right corner, and hovering the mouse cursor displays the latest collected value for each graph in a popup.
- Hovering the mouse cursor over a graph shows the performance metric value at the specified time in a popup.
Table. Monitoring Target Details
- The collection interval of performance metrics may vary depending on the service.
- The chart displays data at 30 points, and the data collection interval based on the data query range (time) is as follows. (The displayed points may vary due to collection time errors.)
30 minutes: approximately 1‑minute intervals
60 minutes: approximately 2‑minute intervals
3 hours: approximately 6‑minute intervals
6 hours: approximately 12‑minute intervals
12 hours: approximately 24‑minute intervals
24 hours: approximately 48‑minute intervals
- Day 3: approximately 144-minute interval (2 hours 24 minutes) 7 days: approximately 336-minute interval (5 hours 36 minutes)
- Day 14: approximately 672‑minute interval (11 hours 12 minutes) Custom: value obtained by dividing the custom range (minutes) by 30
- The data for each point represents the maximum value within the query range (time), and you can change the statistical type in the detailed chart.
Compare performance
You can view the performance metrics of each monitoring target and select the desired metrics for comparison.
Getting Started with Performance Comparison
Generate a chart that compares the performance of monitoring targets, allowing you to compare each performance.
- Only performance metrics of the same service type can be compared.
- Performance items may be added based on the detailed attributes of the service type.
- Performance of Windows OS on a VM
- Search Engine’s Kibana-related performance
To begin the performance comparison, follow these steps.
Click Cloud Monitoring Console > Performance Analysis. You will be taken to the Performance Analysis page.
After entering the search criteria for the monitoring target to be analyzed in the search area, click Search.
After selecting all monitoring targets to compare performance, click Compare Performance. A popup window that allows performance comparison will open.
Item description Monitoring target Display the service type of the monitoring target to compare and click to change the service - Changing the service will remove all charts created so far.
- Click Add to search for monitoring targets of the currently selected service and add
- The selected monitoring target is displayed on the page, and you can delete the monitoring target by clicking X or Delete All
Performance items Display all performance metrics collected from the currently selected service - Check the items you want to compare performance for, and those performance items will be included in the chart.
Chart display method Select display method for performance comparison chart - Detailed: The performance comparison chart is displayed in detail. (default)
- Summary: The performance comparison chart is displayed in summary
Set query period - Date/Time: Displays the reference date and time for data retrieval
- Refresh: Refreshes directly to the current time.
- Start/Stop: Turns the automatic refresh feature off or on.
- Settings: Set the data query period or change the automatic refresh interval
Chart area Compare the performance of monitoring targets based on the selected performance metric and display it as a chart. Click Add. A popup window opens where you can add a monitoring target.
After selecting the monitoring target to compare performance, click the Confirm button. If you select Kubernetes Engine, you must also select the sub-type of that service.
Select the performance metrics to compare. The selected metrics will be added to the chart.
Explore the chart
The performance comparison results are displayed as a chart. Users can modify the shape of the generated chart or download it as an image or Excel file.
- When you place the mouse cursor over the graph, the performance metric value for the specified time appears as a popup.
- Click a target item in the legend area to hide or unhide the corresponding graph.
Item description Statistical methods Set the statistical method to display in the graph - Statistics are displayed in a graph for a period ranging from a minimum of 5 minutes to a maximum of 6 hours.
- Default, Maximum, Minimum, Average, Total can be selected. Multiple methods can be selected simultaneously, and the selected items are shown in the legend area
Chart format Select the type of graph to display on the chart - Line: line graph
- Stacked Area: area graph
- Scatter: scatter plot
Download chart Check and download the chart’s Raw Data - Chart PNG File: Download the chart as an image file (PNG).
- Chart Excel File: Download the performance item data displayed in the chart as an Excel file. The chart’s displayed data is a dataset automatically collected based on the query range.
- Raw Excel File: Collect the entire performance item data shown in the chart within the query period and download it as an Excel file.
Add time series graph widget Add the chart to the custom dashboard as a time series graph widget - When you click, a popup window for adding a time series graph widget opens.
Delete Delete the performance comparison result chart Performance Comparison Status Display performance comparison results as a graph - When you place the mouse cursor over the graph, the performance comparison status for that time period is shown in a popup window.
2.3 - Log Analysis
In log analysis, you collect the logs of the monitoring target, review their contents, and convert them into structured metrics for monitoring. Each monitoring target provides default collected logs, and users can create custom logs to collect and view additional logs as needed.
- To use log analysis, you must first install and operate a log collection agent. For detailed information on installing and operating the log agent, see Managing the Agent.
- To collect logs from Kubernetes Engine, you must configure log collection in the Samsung Cloud Platform Console.
Getting Started with Log Analysis
You can view the log status list or search for logs to be monitored to check them. To view the log status list, follow these steps.
- Cloud Monitoring Console > Log Analysis > Log Overview. Click Log Overview to navigate to the Log Overview page.
- After entering the search criteria for the service to be analyzed in the search area, click Log Search.
- The list of services that match the search criteria and the search information are displayed at the bottom.
- Click the View Details button for each service to display that service’s detailed log information.
Item description Search area The displayed search filters in the search area vary depending on the service type - Advanced Search to perform Advanced Search, click the Advanced Search button.
- You can select one or more condition items for each advanced search filter
Number of monitoring targets displayed Search results quantity and the number of items displayed at once in the list - The default is 20 items per page.
- The list display count can be changed to 10, 20, 30, 40, 50, or 100 items per page
Search information Display the search result values for the search criteria items. View Details View detailed information of the relevant monitoring target Log Search Combine keywords and queries to search logs and view detailed information
- If a Virtual Server or Node is connected to the monitoring target, the corresponding status is also displayed in the search information area.
- The name of the monitoring target can include Korean characters, English letters (both uppercase and lowercase), numbers, and special symbols (
-,_,.), and can be up to 100 characters long. - When the monitoring target does not have permission, information about the unauthorized target and a permission verification message are displayed in a popup.
Check detailed log information
You can view detailed log entries and log graphs of the monitoring target.
Check log list
You can view detailed log information in the monitoring detail popup window. To view detailed monitoring information for a log, follow these steps.
- Cloud Monitoring Console > Log Analysis > Log Overview. Click Log Overview. Log Overview page will open.
- Click the log you want to view detailed information for on the Log Status page. The Monitoring Details popup window opens.
- Click the Log tab.
- When you place the mouse cursor over the graph, the values of each log entry appear in a popup window.
- Click the icon in the upper right corner to set the query period or change the refresh interval.
- You can select the graph display method by clicking the Details, Summary buttons located at the top left of each log chart.
Item Explanation Basic Information Display basic information about the monitoring target Details Charts for each log of the monitoring target are expanded and displayed - View a single chart in detail
Summary Performance charts of the monitoring target are displayed in a grid layout - View multiple charts at a glance
Set query period - Date/Time: Displays the reference date and time for data retrieval.
- Refresh: Manually refreshes to the current time.
- Start/Stop: Turns the automatic refresh feature off or on.
- Settings: Sets the data query period or changes the automatic refresh interval
Performance Comparison Combine keywords and queries to search logs and view detailed information. Performance chart Charts for each log of the monitoring target are displayed as graphs - When you place the mouse cursor over the graph, the log entry value at the specified time appears in a popup window.
Search logs to verify
You can combine keywords and queries to search logs and view detailed information.
To search the logs, follow these steps.
Cloud Monitoring Console > Log Analysis > Log Overview. Click Log Overview. You will be taken to the Log Overview page.
On the Log Overview page, click Log Search. You will be taken to the Log Search page.
Item Explanation Monitoring target Display the service type of the monitoring target to compare - Click the monitoring target list to change the service
- Changing the service will cause all charts created so far to disappear.
- Add button to search for and add monitoring targets of the currently selected service
- The selected monitoring targets are displayed on the page, and you can delete a monitoring target by clicking X or Delete All.
Search criteria Set conditions for the logs to be searched Set query period - Date/Time: Displays the reference date and time for data retrieval.
- Refresh: Manually refresh to the current time.
- Start/Stop: Turns the automatic refresh feature off or on.
- Settings: Set the data query period or change the automatic refresh interval
Log volume graph When you search the logs, the log entries that match the entered criteria are displayed as a chart. Generated log message Log messages from the monitoring target are displayed by time. Click the Add button. A popup window opens where you can add a monitoring target.
After clicking the monitoring target, select the log file you want to add.
After selecting the log file, click the Confirm button.
After entering the search criteria, click the Search button. The search results will be displayed on the log volume graph and the log messages.
Item description Add indicator Add metrics to log search results - Use after searching logs
Execution History Check the list of search criteria that were recently executed - The execution history displays up to the last 20 executed search criteria
- You can select the desired search history and input it as the current search criteria
Search field Select the search field Condition Select search criteria like,!like,=,!=,<=,>=,>,<can be selected
search value Enter the keyword to search Log Search Select the operator (AND, OR) for the newly added search condition - Displayed only when a new search condition is added
Add condition Add a new search condition When you search the logs, the log entries that match the entered criteria are displayed as a chart.
- Log entries are displayed in seconds.
Item Explanation Log volume graph The log volume over the selected period is displayed as a graph - When you hover the mouse cursor over the graph, the values of each log entry appear in a popup window.
- Clicking a bar in the graph displays the list of logs for that point in time.
Set query period - Date/Time: Displays the reference date and time for data lookup
- Refresh: Manually refresh to the current time.
- Start/Stop: Turn the automatic refresh feature off or on.
- Settings: Set the data query period or change the automatic refresh interval
Monitoring target The monitoring target list is displayed - When you select a monitoring target to view log messages, the log list shows the content
Log list Log messages generated from the monitoring target are displayed by time - Click the button in the log list to view the full message of that log
- Click Download to save the currently displayed log messages in Excel and TXT file formats
- Log entries are displayed in seconds.
Check log collection status
You can view the main log collection information for the past 7 days as a chart.
- When you place the mouse cursor over the graph, detailed information appears in a popup window.
- Only collected logs are aggregated, and logs that have not been collected are not displayed in the status.
- When you create an Account, we provide a default virtual capacity of 1 GB to store the collected logs.
- All logs can be stopped and restarted as needed.
To check the log collection status, click Cloud Monitoring Console > Log Analysis > Log Collection Dashboard.
| Item | description |
|---|---|
| Cumulative log volume | Display the amount of logs collected from the 1st of each month in GB
|
| Log collection volume for the past 7 days | Display the amount of logs collected over the past 7 days by service type in a graph
|
| Log occurrence rate by service | Display logs collected over the past 7 days, categorized by service
|
| Log Collection Top 10 | Display a graph of the top 10 monitoring targets that collected the most logs in the past 7 days within the selected service, based on log occurrence rates by service
|
To perform monitoring related to logs, you must first install and operate a log collection agent. For detailed information on installing and operating the log agent, refer to 에이전트 관리하기.
- Accumulated logs are stored up to a maximum of 1 GB. If it exceeds 1 GB, older logs are automatically deleted.
Check indicator configuration status
You can create a metric to display the occurrence count of log patterns as a time series. To view the metric list, click Cloud Monitoring Console > Log Analysis > Metric Configuration Status.
| Item | description |
|---|---|
| Search area | The displayed search filters in the search area vary depending on the service type
|
| Number of monitoring targets displayed | Display search results
|
| Search information | Display the search result values for the search criteria items. |
| Add | Add a new metric |
| Delete | Select the metric in the search information and delete it. |
Check detailed indicator information
Follow these steps to view detailed information about the indicator.
- Cloud Monitoring Console > Log Analysis > Metric Configuration Status. Click Metric Configuration Status. You will be taken to the Metric Configuration Status page.
- Indicator Setting Status Click the indicator name to view detailed information on the page. Indicator Details popup window opens.
Add indicator
You can add a new metric to display the desired log data as a time series.
- Log metrics can only be set for monitoring targets where the log agent is installed or logs are being collected. For detailed information on installing and operating the log agent, refer to 에이전트 관리하기.
To add a new metric, follow these steps.
Cloud Monitoring Console > Log Analysis > Click Metric Configuration Status. You will be taken to the Metric Configuration Status page.
Indicator Settings Status on the page, click the Add button. Add Indicator popup will open.
Enter indicator name.
- Metric names can only use English uppercase and lowercase letters, underscores (_), periods (.), and hyphens (-).
- To distinguish metrics from general performance, the prefix
metricfilter.is automatically added and cannot be removed or changed.Item Explanation Indicator Name Enter the name of the metric to create Monitoring Target Display the service type of the monitoring target to compare - Click the monitoring target list to change the service
- Changing the service will cause all charts created so far to disappear.
- Click the Add button to search for and add the monitoring target of the currently selected service
- The selected monitoring target is displayed on the page, and you can delete the monitoring target by clicking X or Delete All
Search Criteria Set conditions for the logs to be searched Set Query Period - Date/Time: Displays the reference date and time for data retrieval
- Refresh: Refreshes directly to the current time.
- Start/Stop: Turns the automatic refresh feature off or on.
- Settings: Allows you to set the data query period or change the automatic refresh interval.
Log volume graph When you search the logs, the log entries that match the entered criteria are displayed as a chart Generated log message Log messages from the monitoring target are displayed by time.
Click the Add button. A popup window opens where you can add a monitoring target.
After clicking the monitoring target, select the log file you want to add.
After selecting the log file, click the Confirm button.
After entering the search criteria, click the Search button. The search results will be displayed on the log volume graph and the generated log messages.
Item description Add indicator Add metrics to log search results - Use after searching logs
Execution History Check the list of search criteria that were recently executed - The execution history displays up to the last 20 executed search criteria
- You can select the desired search history and input it as the current search criteria
Search field Select the search field Condition Select search criteria like,!like,=,!=,<=,>=,>,<can be selected
search value Enter the keyword to search operator Select the operator (AND, OR) for the newly added search condition - Displayed only when a new search condition is added
Add condition Add a new search condition Click the Confirm button. A new metric will be added with a toast popup message.
Modify indicator search criteria
To modify the indicator’s search criteria, follow these steps.
- Cloud Monitoring Console > Log Analysis > Click Metric Configuration Status. Metric Configuration Status page will open.
- Indicator Settings Overview On the page, click the indicator name of the metric you want to edit. The Indicator Details popup will open.
- In the Indicator Details popup, click the Edit button. The Edit Indicator popup opens.
- Metric Update After modifying the search criteria in the popup window, click the Confirm button. The metric will be updated along with a toast popup message.
Delete indicator
To delete the indicator, follow these steps.
- If there are charts or event policies that use the metric you want to delete, you cannot delete that metric.
- Cloud Monitoring Console > Log Analysis > Metric Configuration Status. Click it. You will be taken to the Metric Configuration Status page.
- On the Indicator Settings page, select the indicator to delete, then click the Delete button. The indicator will be removed along with a toast popup message.
2.4 - Managing Events
An event is a setting that notifies the user when a performance metric of the monitored target meets a specific condition. By configuring events, users can capture essential monitoring information without missing it. For example, if you set an event to trigger whenever a performance value related to overload exceeds a certain threshold, a notification is sent to the user each time there is a risk of overload during resource operation. Users can respond proactively before problems occur based on this.
In event management, you can create such events and configure them to notify designated users whenever a specific value occurs during monitoring.
Check Event Status
In the Event Status, you can view information about all generated events, related performance metrics, and the history of event notifications delivered to users. To view the Event Status list, follow these steps.
- Click Cloud Monitoring Console > Event Management > Event Status. You will be taken to the Event Status page.
- On the Event Status page, enter the search criteria for the service whose event status you want to check in the search area, then click the Search button.
Item description Search area The search filters displayed in the search area differ according to the service type - To perform an Advanced Search, click the Advanced Search button.
- You can select one or more condition items for each advanced search filter
Number of monitoring targets displayed Display the quantity of search results and the number of items that can be viewed at once in the list - The default number of items shown in the list is 20 per page.
- The list display count can be changed to 10, 20, 30, 40, 50, or 100 items per page
Search information Display search result values for the search criteria items - Clicking the message content of each service allows you to view detailed event information
View Details View detailed information of the relevant monitoring target Table. Event List
- If a Virtual Server or Node is connected to the monitoring target, the corresponding status is also displayed in the search information area.
- The name of the monitoring target can include Korean characters, English letters (both uppercase and lowercase), numbers, and special symbols (
-,_,.), and can be up to 100 characters long.
View event status list
In the monitoring detail popup, you can view the event information, occurrence time, and duration in the event list. To check the event occurrence status, follow these steps.
- Click Cloud Monitoring Console > Event Management > Event Status. You will be taken to the Event Status page.
- On the Event Status page, click the Event tab.
Item description Event status Check event message and occurrence time active Show only events that are currently active All Show all events Event Details Check the detailed information of the selected message in the event status Table. Event tab
Check event details
To view the event details, follow these steps.
- Click Cloud Monitoring Console > Event Management > Event Status. You will be taken to the Event Status page.
- On the Event Status page, click the Event tab.
- On the Event Status page, after selecting the event for which you want to view detailed information, click Event Details to view the event publishing conditions, performance items, and notification history.
Item Explanation Monitoring target Display the name of the monitoring target Occurrence condition Display the condition under which the event occurs Performance items Displays a chart for performance items. - Hovering the mouse cursor over the graph shows detailed performance values for each time period.
Notification History Display the full alarm occurrence history Event Settings Details View the configuration information of the event Table. Event Details
Manage Event Settings
You can configure event details such as the monitoring target, performance metrics that define the event trigger, event severity level, and event notification recipients. When data collected from the monitoring target meets the conditions set in the event policy, notifications are delivered to the user via email, SMS, or messages.
- Event policies can be set only when a monitoring target is specified, and policies for each Auto-Scaling Group can be configured at the group level.
Check Event Settings
To verify the event settings, follow these steps.
- Click Cloud Monitoring Console > Event Management > Event Settings. Navigate to the Event Settings page.
- On the Event Settings page, enter the search criteria for the service whose event policy you want to check in the search area, then click the Search button.
Item Explanation Search area The search filters displayed in the search area vary depending on the service type - To perform Advanced Search, click the Advanced Search button.
- You can select one or more condition items for each advanced search filter
Number of monitoring targets displayed Display search results - The default is to show 20 items per page.
- Change the number of items displayed in the list to 10, 20, 30, 40, 50, or 100 per page.
Monitoring target Display the name of the monitoring target - When the checkbox is selected, the Delete, Activate, and Notification Recipient buttons become enabled.
Performance items Display performance items for the event configuration target Individual item Display individual performance items under the performance category - If there are no individual items, nothing is displayed.
Type/Unit Display the value type and unit of the performance item Event rating Display the risk level of the event - The risk level is set manually by the user when adding an event
- Fatal: The highest risk level.
- Warning: A medium-level risk.
- Information: The lowest risk level, for reference.
threshold Display the reference value for comparing performance values. Notification recipients Display the recipients of the event notification - When the mouse cursor is placed over the name, the full list is displayed on the page
Policy status Indicates whether the event is active View Details Check and edit event details - Click ‘View Details’ to open the detailed information popup for the event.
Add Add event Delete Delete event Enable Enable or disable the event Notification recipients Check and manage event notification recipients Table. Event Settings
- The name of the monitoring target can include Korean characters, English letters (both uppercase and lowercase), numbers, and special symbols (
-,_,.), and can be up to 100 characters long. - When the monitoring target does not have permission, information about the unauthorized target and a permission verification message are displayed in a popup.
Check detailed event settings
You can view detailed information about the monitoring target and event conditions, and modify the event conditions and notification information.
Add Event Settings
To add an event setting, follow these steps.
- Event policies can only be set when a monitoring target is specified.
- Auto-Scaling Group policies can be applied on a per-group basis.
Click Cloud Monitoring Console > Event Management > Event Settings. Navigate to the Event Settings page.
Click the Add button on the Event Settings page. The Add Event Settings popup opens.
Item description Target Name Select the monitoring target to add an event setting - Click the monitoring target list to change the service
- Changing the service will delete all event conditions created so far.
- Click the Add button to search for and add the monitoring target of the currently selected service
- The selected monitoring target is displayed on the page, and you can delete the monitoring target by clicking X or Delete All
Event Settings Area Set the performance and occurrence conditions for the event Notification information area Set the notification recipients and method when an event occurs. Table: Description of the Add Event Settings PopupAfter selecting the service type in the monitoring target area, click the Add button. The add monitoring target popup window will open.
After selecting the monitoring target, click the Confirm button.
- You can select multiple monitoring targets simultaneously.
- If there are multiple monitoring targets, the configured event is added identically to each monitoring target.
- If you select Kubernetes, you must also select the sub-type of that service.
In the performance items, click the item where you want to add an event, then enter the event trigger condition.
- The added performance items display the count of additions next to the performance name.
- If you select multiple performance items, you must enter the event occurrence condition for each performance item.
Item explanation Load Event Policy Template Select and apply an existing event policy template. Performance items Click the performance metric for which you want to set the event trigger condition and add it to the event condition configuration area. Event rating Set the event severity level - Fatal: the most dangerous level.
- Warning: a medium-level risk.
- Information: the lowest level of risk and for reference.
Performance type Select the reference value for determining whether an event occurs - Collected value: Use the current value.
- Delta value: Use the difference between the previous and current values.
threshold Set the reference value to compare with the collected performance values - It serves as the criterion for determining whether an event occurs.
- Only numbers and decimal points can be entered
Comparison method Select a method that compares the monitoring value of the performance item with the threshold to determine whether an event occurs - Range: Check whether the performance value is within the range specified by the threshold
- Match: Check whether the performance value equals the threshold
- Different: Check whether the performance value differs from the threshold
- At least: Check whether the performance value is greater than or equal to the threshold
- Greater than: Check whether the performance value exceeds the threshold
- At most: Check whether the performance value is less than or equal to the threshold
- Less than: Check whether the performance value is less than the threshold
Individual item Specify an individual performance item under the performance items as an event condition - It is enabled only when the performance item can collect the individual item.
Prefix You can add a prefix to the event message. - Event Status page uses this as a keyword to search for the event.
Statistics Set the statistical method to apply to the collected performance values - When statistics are configured, the performance value calculated using the selected statistical method is compared against the threshold when evaluating event trigger conditions. If not selected, the most recent performance value is compared to the threshold.
- Statistical Method: Choose one of maximum, minimum, average, or sum to compute the collected performance values.
- Statistical Period: Set the time span over which the statistical method is applied. It is the period measured from the most recently collected performance value.
Sustained occurrence count Set the number of consecutive monitoring values that satisfy the event occurrence condition - This value is used as a sensitivity to determine whether the event is a momentary outlier or a real event.
Event occurrence notification time zone Timezone setting feature when configuring event policies Table. Add Event Settings - Event Settings Area
Notification area allows you to set notifications.
Item description Notification recipient selection area Select notification recipients - After selecting the notification recipients, click the Delete button to remove the selected recipient.
Notification recipients / group The list of recipients to receive the notification when an event occurs is displayed. Event risk level The risk level of the configured event is displayed. Notification method The method of delivering notifications to the recipient is displayed. Add Select and add a new notification recipient from the address book. Delete Delete a notification recipient from the notification recipient/group Table. Add event settings - notification info areaCheck the notification recipients, select them, and click the Confirm button.
- Only the account’s root user or an IAM user can be added as a notification recipient.
- You can select multiple items simultaneously.
- Set the notification method for each recipient according to the event risk level.
- The notification method can be selected from email, SMS, and messenger, and multiple methods can be selected simultaneously.
- When the notification method setup is complete, click the Confirm button.
Modify Event Settings
To edit the event’s conditions and notification recipient information, follow the steps below.
- Click Cloud Monitoring Console > Event Management > Event Settings. You will be taken to the Event Settings page.
- Event Settings page, enter the search criteria for the service whose event settings you want to modify in the search area, then click the Search button.
- From the event policy list, click the View Details button of the event policy you want to edit. You will be taken to the Event Settings Details page.
- On the Event Settings Detail page, click the Edit button. You will be taken to the Event Settings Edit page.
- On the Edit Event Settings page, enter the information to be modified, then click the Confirm button.
- You can edit the event conditions and notification information.
Delete Event Settings
To delete the event configuration, follow these steps.
- Click Cloud Monitoring Console > Event Management > Event Settings. You will be taken to the Event Settings page.
- On the Event Settings page, enter the search criteria for the service whose event policy you want to delete in the search area, then click the Search button.
- After checking the event policy to delete in the event policy list, click the Delete button.
- In the confirmation popup, click the Confirm button.
Change Event Settings Activation
You can easily change whether the event policy is enabled.
- Click Cloud Monitoring Console > Event Management > Event Settings. Go to the Event Settings page.
- On the Event Settings page, enter the search criteria for the service whose event policy you want to delete in the search area, then click the Search button.
- In the event policy list, check the event policy whose activation you want to change, then click the Activate button. The Policy Activation popup window will open.
- After selecting the activation status, click the Confirm button.
- Enable All, Disable All buttons can be clicked to change them in bulk.
Change Event Notification Recipients
You can verify the recipients of notifications when an event occurs and change them in bulk.
- The event notification recipient change feature is intended to modify event notification recipients in bulk. Consequently, the existing recipients are removed and replaced with the new recipient settings.
- To view and modify the notification recipients for each policy, click the Edit button on the policy’s detail page, then make the changes.
- Click Cloud Monitoring Console > Event Management > Event Settings. Go to the Event Settings page.
- On the Event Settings page, enter the search criteria for the service whose event policy you want to delete in the search area, then click the Search button.
- After checking the event policy to edit in the event policy list, click the Notification Recipients button. You will be taken to the Notification Recipients page.
- On the Notification Recipients page, after selecting the user to add as a notification recipient, click the Confirm button.
Item description Event policy list The list of event policies for changing the notification recipients is displayed - Click Add to add the policy to be changed
- Click the Delete button in the policy list to remove that policy.
User search area Enter name, email, mobile phone, and company name to search Notification address book Use the notification address book to verify and add users. Search User List The list of users included in the notification address book or search results is displayed - If you check the users to add as notification recipients, they will be added to the notification recipient list.
Notification recipient list The list of users to be added as notification recipients for the event displayed in the event policy list is shown - After checking a user, click the Delete button to remove that user from the list.
Table. Change Event Notification Recipients
Managing Event Templates
You can set the monitoring target, performance values that define event occurrence criteria, and the event risk level, then create and use a template. When adding or modifying an event, you can import an event policy template to easily enter the event conditions.
Check the list of event policy templates
To view the list of event policy templates, follow these steps.
- Click Cloud Monitoring Console > Event Management > Event Settings. Navigate to the Event Settings page.
- On the Event Settings page, click Event Policy Template. You will be taken to the Event Policy Template page.
- On the Event Policy Template page, enter the search criteria for the service whose template you want to check in the search area, then click Search.
Item description Search area Enter the conditions of the event policy template to search. Add event policy template Add event policy template Template List Event policy templates that match the search criteria are displayed. Table. Event Policy Template List
Add event policy template
To add an event policy template, follow these steps.
Click Cloud Monitoring Console > Event Management > Event Settings. Navigate to the Event Settings page.
On the Event Settings page, click the Event Policy Template button. You will be taken to the Event Policy Template page.
On the Event Policy Template page, click the Add Event Policy Template button. The Add Event Policy Template popup opens.
Add Event Policy Template In the popup window, set the service type and template information for adding the event policy template.
*Items marked with * are required fields and must be entered.Item description Service Type Select the service type to set the event policy - Click the service type list to change the service
- If you change the service, all event conditions created so far will be lost.
Template name Enter the name of the template to create Template description Enter a description for the template to be created Table. Add event policy template – set service type and template nameIn the performance items, click the item where you want to add an event, then enter the event trigger condition.
- The added performance items display the count of additions next to the performance name.
- If you select multiple performance items, you must enter the event trigger condition for each item.
*Items marked with * are required fields and must be entered.Item description Load Event Policy Template Select and apply an existing event policy template - When you load the template, the event conditions and notification recipients are replaced with the information set in the template.
Performance items Click the performance metric to set the event trigger condition and add it to the event condition configuration area. Event rating Set the event severity - Fatal: The most dangerous level.
- Warning: A medium-level risk.
- Information: The lowest risk level, for reference only.
Performance type Select the reference value to determine whether an event occurs - Collected value: Use the current value.
- Delta value: Use the difference between the previous and current values.
threshold Set the reference value to compare with the collected performance values - It serves as the criterion for determining whether an event occurs.
- Only numbers and decimal points can be entered
Comparison method To determine whether an event occurs, select the method that compares the monitoring value of the performance item with the threshold. - Range: Check if the performance value is within the range specified by the threshold.
- Match: Check if the performance value matches the threshold.
- Different: Check if the performance value differs from the threshold.
- AtLeast: Check if the performance value is greater than or equal to the threshold.
- Exceeds Check if the performance value exceeds the threshold.
- AtMost: Check if the performance value is less than or equal to the threshold.
- LessThan: Check if the performance value is less than the threshold.
Individual item Specify an individual performance item under the performance items as an event condition - It is enabled only when the performance item can collect the individual item.
Prefix Add an event message prefix - It is used as a keyword to search for this event on the event status page.
Statistics Set the statistical method to apply to the collected performance values - When statistics are set, the performance value calculated using the configured statistical method is compared to the threshold when determining event trigger conditions. If not selected, the most recent performance value is compared to the threshold.
- Statistical Method: Choose one among maximum, minimum, average, sum to calculate the collected performance values.
- Statistical Period: Set the period over which the statistical method calculation is applied. It is the period from the most recently collected performance value.
Number of occurrences Set the number of consecutive monitoring values that satisfy the event trigger condition - This value is used as a sensitivity to determine whether the event is a transient anomaly or a genuine event.
Event occurrence notification time window Timezone setting feature when configuring event policies Table. Add event policy template – performance item
Set the recipients and delivery method for the information when a notification occurs.
Item description Add Select and add a new notification recipient from the address book. Delete Delete the selected notification recipient(s) from the notification recipients/group Notification recipients / groups The list of recipients to receive the notification content is displayed when an event occurs - After selecting a notification recipient, clicking the Delete button removes that recipient.
Event risk level The risk level of the event to be delivered is displayed Notification method The method of delivering notifications to the recipient is displayed - you can choose among email, SMS, and messenger, and you can select multiple methods simultaneously
Table. Add event policy template – set notification recipients
- Only Account members and the notification address book registered in the Account can be added as recipients.
- You can select multiple items simultaneously.
- Click the Confirm button. The event policy template will be added along with a toast popup message.
Edit and delete event policy templates
To modify or delete an event policy template, follow the steps below.
- Click Cloud Monitoring Console > Event Management > Event Settings. You will be taken to the Event Settings page.
- On the Event Settings page, click the Event Policy Template button. You will be taken to the Event Policy Template page.
- On the Event Policy Template page, enter the search criteria for the service whose template you want to view in the search area, then click the Search button.
- Click the More button at the top right of the template you want to edit or delete, and then click Edit or Delete.
- Edit: The template edit popup window opens. After editing the template, click the Confirm button.
- Delete: The template will be deleted along with a toast popup message.
- Click the Confirm button. The template will be deleted along with a toast popup message.
Share event policy template
To share the event policy template, follow these steps.
- Click Cloud Monitoring Console > Event Management > Event Settings. Navigate to the Event Settings page.
- On the Event Settings page, click the Event Policy Template button. You will be taken to the Event Policy Template page.
- On the Event Policy Template page, enter the search criteria for the service whose template you want to view in the search area, then click the Search button.
- Click the More > Share button located at the top right of the template you want to share.
- After selecting the user to share with, click the > button. The selected user will be added to the sharing target.
- Click the Confirm button. The template will be shared with a toast popup message.
Filtering events
You can filter notifications for events that occur during a specific period. While event filtering is applied, notifications will not be delivered even if events occur.
To view the event filtering list, follow these steps.
- Cloud Monitoring Console > Event Management > Event Filtering click. You will be taken to the Event Filtering page.
Item description Filtering Timeline Display the timeline of registered filters by date - Registered filters are displayed on the timeline as bars. Clicking a bar shows the filter’s detailed information.
- The numbers from 00 on the left to 23 on the right represent the hour of the day.
- The blue vertical line below the time indicates the current time.
<,>Click to change the displayed date
Filtering list Displays a list of information and operational status of registered filters - Running: The filter is registered and currently operating
- Ended: The filter’s operation has ended after the set period has passed.
- Scheduled: The filter registration is complete and is pending. The filter will operate when the set period arrives.
- Disabled: The filter is in a stopped state. It is displayed when ‘Use’ is not selected in the detailed settings
Add Add new event filtering Delete Delete the selected event filter from the filter list Search area Search for event filtering or monitoring targets Table. Event Filtering List
Add event filtering
To add event filtering, follow these steps.
- Click Cloud Monitoring Console > Event Management > Event Filtering. Navigate to the Event Filtering page.
- Click the Add button on the Event Filtering page. The Add Event Filtering popup opens.
- Add Event Filtering Enter the filtering information in the popup window.
Item Explanation Event filtering Enter the name of the event filter Usage status Set whether event filtering is used - If set to Not Used, it will be displayed as Disabled until changed to Enabled, and filtering will not operate.
Time zone Set the reference time zone for applying event filtering Iteration type Set the repeat application of event filtering - No repeat: Enter the start and end year, month, day, hour, minute. Filtering occurs only once without repetition.
- Daily, weekdays: Enter only the start time and end time. Filtering repeats daily at the specified times.
Period Set the period during which event filtering is applied - Applied Time: For recurring tasks, it is active and displays the elapsed time from the start time to the end time
- Conversion Period: The event filtering period is converted and displayed based on the time zone set by the user
Event filtering target Select the service type and monitoring target to apply event filtering, then add. Table. Add event filtering - Click the Confirm button. Event filtering will be added with a toast popup message.
Modify Event Filtering
To modify event filtering, follow these steps.
- Click Cloud Monitoring Console > Event Management > Event Filtering. Proceed to the Event Filtering page.
- Event Filtering page, click the name of the filter you want to edit. Event Filtering Details popup window opens.
- Event Filtering Details in the popup window, click the Edit button. The Edit Event Filtering popup window opens.
- Edit Event Filtering After entering the changes in the popup window, click the Confirm button. The event filtering will be updated with a toast popup message.
Delete event filtering
To delete event filtering, follow these steps.
- Click Cloud Monitoring Console > Event Management > Event Filtering. Navigate to the Event Filtering page.
- On the Event Filtering page, select the event filter you want to delete, then click the Delete button. The event filter will be deleted along with a toast popup message.
- You can select multiple event filters simultaneously.
Managing Notification Groups
You can group the recipients who receive notifications when an event occurs into a single group for management. Notification Group allows you to efficiently manage notification recipients and configure notification settings easily and quickly.
To check the notification group, follow the steps below.
- Click Cloud Monitoring Console > Event Management > Alert Group. Go to the Alert Group page.
- Notification Group page allows you to view and manage notification groups.
Item description Add notification group Add a new notification group. Notification Group Displays a list of all notification group created by the user. - When a notification group is clicked, the notification group details popup opens.
- Click the Edit button to modify the notification group
Advanced Search You can search the address book by entering the notification group name. Keyword search You can search by selecting the notification group, user name, creation timestamp, and last modified timestamp.
Add Notification Group
To add a notification group, follow these steps.
- Cloud Monitoring Console > Event Management > Add Alert Group Click.
- Add Notification Group page allows you to enter the notification group name and description, then add users.
- Click the Save button to add the notification group.
Edit Notification Group
You can add a user to a notification group or delete a user registered in the notification group.
Add User
To add a user to the notification group, follow these steps.
- Click Cloud Monitoring Console > Event Management > Alert Group.
- In the All Notification Groups, click the notification group to which you want to add a user, then click Edit.
- Please select the user to add.
- Only users registered in the Account can be added to the address book.
- You can quickly find the desired members using the real-time search GUI.
- Click the Save button. The user address will be added with a toast popup message.
Delete Notification Group
To delete a notification group, follow these steps.
- Click Cloud Monitoring Console > Event Management > Alert Group.
- Click the notification group you want to delete from the overall notification groups.
- After selecting the notification group to delete, click Delete.
- You can select multiple addresses simultaneously.
- Click the Confirm button. The address will be deleted along with the toast popup message.
2.5 - Using Custom Dashboards
Custom dashboards are personalized dashboards that users configure by selecting the widgets they want. Users can use custom dashboards to arrange monitoring information as they wish, and they can share the created custom dashboards with other users.
The content covered in Using Custom Dashboards is as follows.
Getting Started with Custom Dashboard
After creating a custom dashboard, the user can add desired widgets to view monitoring information.
Create a custom dashboard
To create a custom dashboard, follow these steps.
- From the top‑right menu, click Custom Dashboard Management. You will be taken to the Custom Dashboard Management page.
- Click Add Dashboard. The Add Dashboard popup window opens.
- Enter the dashboard name to create and click the Save button.
- The custom dashboard you created appears in the My Dashboard list.
Add widget
Custom dashboards provide widgets in various formats such as performance statistics, comparison charts, and event lists. Users can add the information they want to monitor as widgets and freely configure the custom dashboard.
- You can change the position and size of a created widget, or edit, copy, and delete its content. For more details, see Custom Widget Management.
To add a widget, follow these steps.
- From the top‑right menu, click Custom Dashboard Management. You will be taken to the Custom Dashboard Management page.
- From the My Dashboard list, select the custom dashboard to add a widget.
- Click the + button or the Add Widget button at the top right of the dashboard. The Add Widget popup window opens.
- Add Widget In the popup window, select the widget to use on the dashboard and add it.
- When you select a widget, detailed settings and a preview are displayed.
- For explanations and configuration methods for each chart, refer to Custom Widget.
- Click the Confirm button.
Custom widget
The types of widgets that can be added to a custom dashboard are as follows.
| Widget Name | Explanation |
|---|---|
| Title Box | Display the title box on the custom dashboard. |
| Event status | Displays the event that occurred. |
| Monitoring Status | Displays the number of monitoring targets and the monitoring status. |
| Top 5 Key Performance | Displays the top 5 monitoring targets with the highest utilization for a specific performance metric. |
| Event map | Displays the number of events per service by severity level. |
| Event History | Displays the count of events per date by severity. |
| time series graph | Displays the performance metrics of the selected monitoring target as a time-series graph. |
| Current status indicator | Displays the performance value statistics and risk levels of the selected monitoring targets. |
| Instance map | Displays the performance values of the selected monitoring targets using colors of varying intensity. |
Title Box
Displays a title box on the custom dashboard.
- You can create up to 10 title boxes.
- You can add multiple title boxes at the same time.
| Item | description |
|---|---|
| Title | Enter the text to display in the title box. |
| Add | Add a new text box. |
| Delete | Delete the corresponding text box. |
Event status
Displays the occurred event.
- You can configure it to display all events that have occurred, or only the active events.
| Item | description |
|---|---|
| Widget name | Enter the name of the widget. |
| Query range | Select the range of events to display in the widget
|
Monitoring Status
Displays the number of monitoring targets and the monitoring status.
| Item | explanation |
|---|---|
| Widget name | Enter the name of the widget. |
Key Performance Top 5
Shows the top five monitoring targets with the highest usage rate for a specific performance metric within the Account.
| Item | description |
|---|---|
| Widget name | Enter the name of the widget. |
| Service | Select the service to check performance. |
| Performance metrics | Select the performance metric that serves as the basis for displaying the monitoring target
|
Event map
Displays the number of events per service by severity level.
| Item | description |
|---|---|
| Widget name | Enter the name of the widget. |
Event History
Displays the number of events per date, grouped by severity.
| Item | description |
|---|---|
| Widget name | Enter the name of the widget. |
Time Series Graph
Displays the performance metrics of the selected monitoring target as a time-series graph.
- You can change the period displayed by the time series graph using the dashboard’s date range setting feature.
- When you place the mouse cursor over the graph, you can view the time and performance values for each target at that point.
| Item | description |
|---|---|
| Widget name | Enter the name of the widget. |
| Service | Select the service to check performance. |
| Monitoring target | Select the monitoring target to display as a graph. |
| Performance items | Select the performance metric to display as a graph. |
| Add option | You can display a danger zone.
|
You can click the icon at the top right of the preview to change the graph type.
- Linear graph
- area chart
- stacked bar chart
- scatter plot
Current Indicator
Displays statistical figures and risk levels for the performance values of monitored entities.
In the monitoring dashboard, if you place the mouse cursor over a status indicator value, you can view detailed information for that item.
| Item | description |
|---|---|
| Widget name | Enter the name of the widget. |
| Service | Select the service to check performance. |
| Monitoring target | Select the monitoring target to display as a graph. |
| Performance metrics | Select the performance items to display as a graph. |
| Statistics | Select the statistical method to display the performance values of the monitoring target
|
| Add option | You can display a danger zone.
|
Instance Map
Display the performance values of monitoring targets using colors of varying intensity.
- When you position the mouse cursor over each heatmap, you can view detailed information about the item.
| Item | description |
|---|---|
| Widget name | Enter the name of the widget. |
| Service | Select the service to check performance. |
| Monitoring target | Select the monitoring target to display as a graph. |
| Performance metrics | Select the performance metric to display as a graph. |
Check Custom Dashboard
To view the custom dashboard, follow these steps.
- From the top‑right menu, click Custom Dashboard Management. You will be taken to the Custom Dashboard Management page.
- From the My Dashboard list, select the Custom Dashboard.
Item description Dashboard List Displays the list of custom dashboards. Click a list item to change the dashboard to view. - My Dashboards: Displays the list of dashboards you created yourself.
- Shared Dashboards: Displays the list of dashboards shared with you.
Dashboard name The name of the user dashboard is displayed. Dashboard Settings - Date/Time: Displays the reference date and time for the analysis information.
- Refresh: Refreshes to the current time.
- Stop/Start: Turns the automatic refresh feature off or on.
- Settings: You can set the data query period or change the automatic refresh interval. (See “Setting the query period” for reference)
Add widget Add a new widget to the dashboard. Dashboard editing You can edit the currently configured custom dashboard. - Dashboard Edit: Modify the name of the currently selected dashboard.
- Dashboard Copy: Copy the currently selected dashboard to create a custom dashboard with the same widgets.
- Dashboard Delete: Delete the currently selected dashboard.
- Dashboard Share: Share the dashboard so that specific users can view it. For more information, see Sharing Custom Dashboards.
Custom widget Displays the widgets that make up the dashboard. - You can change the widget’s position and size, or edit and delete it. For more information, see Managing Custom Widgets
- You can download graphic widgets as image files.
Table. Custom dashboard information
Download widget
Graphic widgets can be downloaded as image files (*.png).
When you hover the mouse over a graph widget, a download button appears in the upper right corner. Clicking the download button downloads the widget as an image file.
Share Custom Dashboard
You can share a custom dashboard and configure it so that other users can view it.
To share a custom dashboard, follow these steps.
- From the top‑right menu, click Custom Dashboard Management. You will be taken to the Custom Dashboard Management page.
- From the My Dashboard list, select the Custom Dashboard.
- Click Dashboard More at the top right, then click Dashboard Share. The Dashboard Share popup opens.
- After selecting the user to share the dashboard with, click the > button and verify that the selected user moves to the shared target.
- Click the Confirm button.
Managing Custom Dashboards
You can edit, copy, or delete a custom dashboard.
- From the top-right menu, click Custom Dashboard Management. You will be taken to the Custom Dashboard Management page.
- From the My Dashboard list, select the Custom Dashboard.
- Click the Dashboard top-right more button, then select the desired command.
- Edit Dashboard: Edit the dashboard name.
- Dashboard Copy: Copy the dashboard to create a new dashboard.
- Dashboard Sharing: Share the dashboard with other users.
- Delete Dashboard: Deletes the dashboard.
Managing custom widgets
You can change the widget’s position and size, or edit and duplicate the widget.
Change Widget Position
Click the widget’s name, then drag to change its position.
Changing Widget Size
To change the size of the widget, follow these steps.
- Place the mouse cursor over the widget. The size adjustment button appears in the lower right corner of the widget.
- Size Adjustment button, click and hold while dragging to adjust to the desired size.
Edit, copy, delete widget
To modify, copy, or delete a widget, follow these steps.
- Place the mouse cursor over the widget. The More button appears in the top right corner of the widget.
- After clicking the More button, click the desired command.
- Widget Edit: Modify the widget’s chart settings.
- Widget Copy: Copies the widget to create a widget with identical content.
- Delete Widget: Deletes the widget.
2.6 - Managing Agents
The agent is a module that collects performance metrics, logs, and Windows events from the monitoring target. Users must verify the agent’s installation status and operate and manage it in order to use the monitoring functionality.
- If IP access control is configured for the monitoring target, you cannot use agent management. If agent management cannot be used, check the IP access control configuration status of the selected monitoring target.
- The agent management feature uses the sudo command, so the sudo package must be installed in advance.
Agent Management Overview
The agents include a performance collection agent, a log collection agent, and a Windows event log collection agent.
- The agent must be manually installed by the user on each monitoring target according to the user’s requirements.
Manage Agents
Managing Performance Agents
To install and manage the agent, follow these steps.
- Cloud Monitoring Console > Performance Analysis Click the button. You will be taken to the Performance Analysis page.
- On the Performance Analysis page, select a monitoring target and click the View Details button. The Monitoring Target Details popup window opens.
- Monitoring Target Details In the popup window, click the Agent tab. It navigates to the Agent tab.
- Click the Performance button in the Agent tab.
- Click the Copy icon on the right side of the installation command to copy the command.
- Paste the copied command into the monitoring target resource.
- Execute the command copied to the monitoring target resource.
| Item | description |
|---|---|
| Installation | Download the script file required for agent installation and execute it. |
| Start | Execute the agent start command. |
| Stop | Execute the agent stop command. |
| Delete | Execute the agent deletion command. |
| Update | Download the script file required for the agent update and execute it. |
To check the agent service status, use the method below.
- linux: $ sudo systemctl status metricbeat
- windows: Task Manager → service → metricbeat → Status(Running)
Managing Log Agents
To install and manage the agent, follow these steps.
- Click Cloud Monitoring Console > Performance Analysis. You will be taken to the Performance Analysis page.
- On the Performance Analysis page, select a monitoring target and click the View Details button. The Monitoring Target Details popup window opens.
- Monitoring Target Details Click the Agent tab in the popup window. It navigates to the Agent tab.
- Click the Log button.
- Click the Copy icon on the right side of the installation command to copy the command.
- Paste the copied command into the monitoring target resource.
- Execute the command copied to the monitoring target resource.
| Item | Explanation |
|---|---|
| Installation | Download the script file required for agent installation and execute it. |
| Start | Execute the agent start command. |
| Stop | Execute the agent stop command. |
| Delete | Execute the agent deletion command. |
| Update | Download the script file required for the agent update and execute it. |
To check the agent service status, use the method below.
- linux: $ sudo systemctl status filebeat
- windows: Task Manager → service → filebeat → Status(Running)
To add a log for monitoring, select the log addition action, enter the log name and log path correctly, and then click the Generate Command button. Paste the generated command into the monitored resource and then execute it.
Managing Event Agents
To install and manage the agent, follow the steps below.
- Click Cloud Monitoring Console > Performance Analysis. You will be taken to the Performance Analysis page.
- On the Performance Analysis page, select a monitoring target and click the View Details button. The Monitoring Target Details popup window opens.
- Monitoring Target Details In the popup window, click the Agent tab. It navigates to the Agent tab.
- Click the Event button.
- Click the Copy icon on the right of the installation command to copy the command.
- Paste the copied command into the monitoring target resource.
- Execute the command copied to the monitoring target resource.
| Item | Explanation |
|---|---|
| Installation | Download the script file required for agent installation and execute it. |
| Start | Execute the agent start command. |
| Stop | Execute the agent stop command. |
| Delete | Execute the agent deletion command. |
| Update | Download and run the script file required for the agent update. |
To check the agent service status, use the method below.
- windows: Task Manager → service → winlogbeat → Status(Running)
2.7 - Appendix A. Service-specific Monitoring Targets
Compute type
Virtual Server
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | OS | Agent Agentless | 1m |
| log | OS | Agent | When a log occurs |
| Status | OS | Agentless | 1m |
GPU Server
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | OS | Agent Agentless | 1m |
| log | OS | Agent | When a log occurs |
| status | OS | Agentless | 1m |
Bare Metal Server
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | OS | Agent | 1m |
| log | OS | Agent | When a log occurs |
| status | OS | N/A | - |
Multi-node GPU Cluster [Cluster Fabric]
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | OS | Agent | 1m |
| log | OS | Agent | When a log occurs |
| status | OS | N/A | - |
Multi-node GPU Cluster [Node]
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | OS | Agent | 1m |
| log | OS | Agent | When a log occurs |
| Status | OS | N/A | - |
Storage type
The monitoring targets, collection methods, and collection intervals are the same for all storage-type services.
- File Storage
- Object Storage
- Block Storage(BM)
- Block Storage(VM)
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | Storage | Agentless | 1m |
| log | Storage | N/A | - |
| status | Storage | Agentless | 1m |
Database type
The monitoring targets, collection methods, and collection intervals are the same for all Database-type services.
- PostgreSQL(DBaaS)
- MariaDB(DBaaS)
- MySQL(DBaaS)
- Microsoft SQL Server
- EPAS
- CacheStore(DBaaS)
- Redis
- Valkey
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | Database Process, OS | Agent | 1m |
| log | Database Process, OS | Agent | When a log occurs |
| status | Database Process | Agent | 1m |
| OS | Agentless | 1m |
Data Analytics type
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | Data Analytics Process, OS | Agent | 1m |
| log | Data Analytics Process, OS | Agent | When a log occurs |
| status | Data Analytics Process | Agent | 1m |
| OS | Agentless | 1m |
Container type
Kubernetes Engine
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | Cluster, Namespace, Node, ReplicaSet, Deployment, StatefulSet, DaemonSet, Job, CronJob, Pod | Agentless | 5m |
| log | Cluster, Namespace, Node, ReplicaSet, Deployment, StatefulSet, DaemonSet, Job, CronJob, Pod | Agentless | When a log occurs |
| status | Cluster, Namespace, Node, ReplicaSet, Deployment, StatefulSet, DaemonSet, Job, CronJob, Pod | Agentless | 5m |
Container Registry
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | Container Registry | Agentless | 5m |
| log | Container Registry | Agentless | When a log occurs |
| status | Container Registry | Agentless | 5m |
Networking type
VPC
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | Internet Gateway | Agentless | 5m |
| log | Internet Gateway | N/A | - |
| status | Internet Gateway | N/A | - |
Load Balancer(OLD)
Load Balancer(OLD)
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | Load Balencer | Agentless | 5m |
| log | Load Balencer | N/A | - |
| status | Load Balencer | Agentless | 5m |
Load Balancer Listener(OLD)
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | Load Balencer Listener | Agentless | 5m |
| log | Load Balencer Listener | N/A | - |
| status | Load Balencer Listener | Agentless | 5m |
Load Balancer
Load Balancer
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | Load Balencer | Agentless | 5m |
| log | Load Balencer | N/A | - |
| status | Load Balencer | Agentless | 5m |
Load Balancer Listener
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | Load Balencer Listener | Agentless | 5m |
| log | Load Balencer Listener | N/A | - |
| status | Load Balencer Listener | Agentless | 5m |
Load Balancer Server Group
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | Load Balencer Server Group | Agentless | 5m |
| log | Load Balencer Server Group | N/A | - |
| status | Load Balencer Server Group | Agentless | 5m |
Direct Connect
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | Direct Connect | Agentless | 5m |
| log | Direct Connect | N/A | - |
| status | Direct Connect | N/A | - |
Cloud WAN
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | Cloud WAN | Agentless | 10m |
| log | Cloud WAN | N/A | - |
| status | Cloud WAN | Agentless | 10m |
Global CDN
| Category | Monitoring target | Collection method | Collection interval |
|---|---|---|---|
| Performance | Global CDN | Agentless | 5m |
| log | Global CDN | N/A | - |
| status | Global CDN | Agentless | 5m |
2.8 - Appendix B. Service-specific Performance Metrics
Compute type
Virtual Server
Agentless (basic metrics)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | description |
|---|---|---|---|---|
| CPU | CPU Usage/Core [Basic] | % | 1m | Percentage of CPU time used, excluding Idle and IOWait states (normalized by the number of cores; 100% when all four cores are fully utilized) |
| CPU | CPU Cores [Basic] | cnt | 1m | Number of virtual processor cores allocated to the virtual machine |
| Memory | Memory Total [Basic] | bytes | 1m | Memory capacity available for use in the domain |
| Memory | Memory Used [Basic] | bytes | 1m | Current memory usage |
| Memory | Memory Swap In [Basic] | bytes | 1m | Swap In memory in bytes |
| Memory | Memory Swap Out [Basic] | bytes | 1m | Swap Out memory in bytes |
| Memory | Memory Free [Bytes] | bytes | 1m | Unused memory capacity in the system |
| Memory | Memory Usage [Basic] | % | 1m | Current memory usage rate |
| Disk | Disk Read Bytes [Basic] | bytes | 1m | Read byte count |
| Disk | Disk Read Requests [Basic] | cnt | 1m | Read request count |
| Disk | Disk Write Bytes [Basic] | bytes | 1m | Write byte count |
| Disk | Disk Write Requests [Basic] | cnt | 1m | Number of write requests |
| State | Instance State [Basic] | enum | 1m | VM status |
| Network | Network In Bytes [Basic] | bytes | 1m | Received bytes |
| Network | Network In Dropped [Basic] | cnt | 1m | Incoming packet drop |
| Network | Network In Errors [Basic] | cnt | 1m | Receive error |
| Network | Network In Packets [Basic] | cnt | 1m | Received packet |
| Network | Network Out Bytes [Basic] | bytes | 1m | Transmit bytes |
| Network | Network Out Dropped [Basic] | cnt | 1m | Transmit packet drop |
| Network | Network Out Errors [Basic] | cnt | 1m | Transmission error |
| Network | Network Out Packets [Basic] | cnt | 1m | Transmit packet |
| Network | Network In Bytes [Delta Basic] | bytes | 1m | Received bytes (delta value) |
| Network | Network In Dropped [Delta Basic] | cnt | 1m | Received packet drop (delta value) |
| Network | Network In Errors [Delta Basic] | cnt | 1m | Receive error (delta value) |
| Network | Network In Packets [Delta Basic] | cnt | 1m | Received packet (delta value) |
| Network | Network Out Bytes [Delta Basic] | bytes | 1m | Transmitted bytes (delta value) |
| Network | Network Out Dropped [Delta Basic] | cnt | 1m | Transmit packet drop (delta value) |
| Network | Network Out Errors [Delta Basic] | cnt | 1m | Transmission error (delta value) |
| Network | Network Out Packets [Delta Basic] | cnt | 1m | Transmitted packet (delta value) |
- For Windows OS, you must install the monitoring performance Agent to provide memory performance metrics.
Agent (Detailed Metrics)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | description |
|---|---|---|---|---|
| CPU | Core Usage [IO Wait] | % | 1m | Ratio of CPU time spent in wait state (disk wait) |
| CPU | Core Usage [System] | % | 1m | Proportion of CPU time spent in kernel space |
| CPU | Core Usage [User] | % | 1m | Proportion of CPU time spent in user space |
| CPU | CPU Cores | cnt | 1m | The number of CPU cores on the host. The maximum value of the unnormalized ratio is 100%* of a core. The unnormalized ratio already incorporates this value, and the maximum value is 100%* of a core. |
| CPU | CPU Usage [Active] | % | 1m | Percentage of CPU time used excluding Idle and IOWait states (when all 4 cores are used at 100%: 400%) |
| CPU | CPU Usage [Idle] | % | 1m | It is the proportion of CPU time spent in idle state. |
| CPU | CPU Usage [IO Wait] | % | 1m | It is the proportion of CPU time spent in a waiting state (disk wait). |
| CPU | CPU Usage [System] | % | 1m | Percentage of CPU time used by the kernel (when all 4 cores are used at 100%: 400%) |
| CPU | CPU Usage [User] | % | 1m | Percentage of CPU time used in user space. (If all four cores are used at 100%, it is 400%) |
| CPU | CPU Usage/Core [Active] | % | 1m | Percentage of CPU time used excluding Idle and IOWait states (value normalized by the number of cores; 100% when all four cores are fully utilized) |
| CPU | CPU Usage/Core [Idle] | % | 1m | It is the proportion of CPU time spent in idle state. |
| CPU | CPU Usage/Core [IO Wait] | % | 1m | It is the proportion of CPU time spent in a waiting state (disk wait). |
| CPU | CPU Usage/Core [System] | % | 1m | Percentage of CPU time used by the kernel (value normalized by the number of cores; 100% when all four cores are fully utilized) |
| CPU | CPU Usage/Core [User] | % | 1m | Percentage of CPU time used in user space. (Value normalized by the number of cores; using all four cores at 100% equals 100%) |
| Disk | Disk CPU Usage [IO Request] | % | 1m | It is the proportion of CPU time during which I/O requests for the device were executed (device bandwidth utilization). If this value approaches 100%, the device becomes saturated. |
| Disk | Disk Queue Size [Avg] | num | 1m | The average queue length of requests executed on the device. |
| Disk | Disk Read Bytes | bytes | 1m | The number of bytes per second read from the device. |
| Disk | Disk Read Bytes [Delta Avg] | bytes | 1m | Average of system.diskio.read.bytes_delta for individual disks |
| Disk | Disk Read Bytes [Delta Max] | bytes | 1m | Maximum system.diskio.read.bytes_delta of individual disks |
| Disk | Disk Read Bytes [Delta Min] | bytes | 1m | Minimum of system.diskio.read.bytes_delta for individual disks |
| Disk | Disk Read Bytes [Delta Sum] | bytes | 1m | Sum of system.diskio.read.bytes_delta for individual disks |
| Disk | Disk Read Bytes [Delta] | bytes | 1m | Delta of the system.diskio.read.bytes value for each disk |
| Disk | Disk Read Bytes [Success] | bytes | 1m | Total number of bytes successfully read. On Linux, assuming a sector size of 512, it is the number of sectors read multiplied by 512. |
| Disk | Disk Read Requests | cnt | 1m | Number of read requests to the disk device per second |
| Disk | Disk Read Requests [Delta Avg] | cnt | 1m | Average of system.diskio.read.count_delta for individual disks |
| Disk | Disk Read Requests [Delta Max] | cnt | 1m | Maximum of system.diskio.read.count_delta for individual disks |
| Disk | Disk Read Requests [Delta Min] | cnt | 1m | Minimum of system.diskio.read.count_delta for individual disks |
| Disk | Disk Read Requests [Delta Sum] | cnt | 1m | Sum of system.diskio.read.count_delta for individual disks |
| Disk | Disk Read Requests [Success Delta] | cnt | 1m | Delta of system.diskio.read.count for each disk |
| Disk | Disk Read Requests [Success] | cnt | 1m | Total number of reads successfully completed |
| Disk | Disk Request Size [Avg] | num | 1m | Average size of requests executed on the device (unit: sectors). |
| Disk | Disk Service Time [Avg] | ms | 1m | Average service time (ms) of input requests executed on the device. |
| Disk | Disk Wait Time [Avg] | ms | 1m | Average time taken for requests executed on the supported device. |
| Disk | Disk Wait Time [Read] | ms | 1m | Average disk wait time |
| Disk | Disk Wait Time [Write] | ms | 1m | Disk average wait time |
| Disk | Disk Write Bytes [Delta Avg] | bytes | 1m | Average of system.diskio.write.bytes_delta for individual disks |
| Disk | Disk Write Bytes [Delta Max] | bytes | 1m | Maximum system.diskio.write.bytes_delta of individual disks |
| Disk | Disk Write Bytes [Delta Min] | bytes | 1m | Minimum of system.diskio.write.bytes_delta for individual disks |
| Disk | Disk Write Bytes [Delta Sum] | bytes | 1m | Sum of system.diskio.write.bytes_delta for individual disks |
| Disk | Disk Write Bytes [Delta] | bytes | 1m | Delta of the system.diskio.write.bytes value for each individual disk |
| Disk | Disk Write Bytes [Success] | bytes | 1m | Total number of bytes successfully written. On Linux, assuming a sector size of 512, it is the number of sectors written multiplied by 512. |
| Disk | Disk Write Requests | cnt | 1m | Number of write requests to the disk device per second |
| Disk | Disk Write Requests [Delta Avg] | cnt | 1m | Average of system.diskio.write.count_delta for individual disks |
| Disk | Disk Write Requests [Delta Max] | cnt | 1m | Maximum of system.diskio.write.count_delta for individual disks |
| Disk | Disk Write Requests [Delta Min] | cnt | 1m | Minimum of system.diskio.write.count_delta for individual disks |
| Disk | Disk Write Requests [Delta Sum] | cnt | 1m | Sum of system.diskio.write.count_delta for individual disks |
| Disk | Disk Write Requests [Success Delta] | cnt | 1m | Delta of system.diskio.write.count for each disk |
| Disk | Disk Write Requests [Success] | cnt | 1m | Total number of successful writes |
| Disk | Disk Writes Bytes | bytes | 1m | The number of bytes per second written to the device. |
| FileSystem | Filesystem Hang Check | state | 1m | filesystem (local/NFS) hang check (normal:1, abnormal:0) |
| FileSystem | Filesystem Nodes | cnt | 1m | Total number of file nodes in the file system. |
| FileSystem | Filesystem Nodes [Free] | cnt | 1m | It is the total number of available file nodes in the file system. |
| FileSystem | Filesystem Size [Available] | bytes | 1m | Disk space (bytes) that an unauthorized user can use. |
| FileSystem | Filesystem Size [Free] | bytes | 1m | Available disk space (bytes) |
| FileSystem | Filesystem Size [Total] | bytes | 1m | Total disk space (bytes) |
| FileSystem | Filesystem Usage | % | 1m | Used disk space percentage |
| FileSystem | Filesystem Usage [Avg] | % | 1m | Average of individual filesystem.used.pct |
| FileSystem | Filesystem Usage [Inode] | % | 1m | inode usage |
| FileSystem | Filesystem Usage [Max] | % | 1m | max among individual filesystem.used.pct |
| FileSystem | Filesystem Usage [Min] | % | 1m | minimum among individual filesystem.used.pct |
| FileSystem | Filesystem Usage [Total] | % | 1m | - |
| FileSystem | Filesystem Used | bytes | 1m | Used disk space (bytes) |
| FileSystem | Filesystem Used [Inode] | bytes | 1m | inode usage |
| Memory | Memory Free | bytes | 1m | Total amount of available memory (bytes). Memory used by system cache and buffers is not included (see system.memory.actual.free). |
| Memory | Memory Free [Actual] | bytes | 1m | Actual usable memory (bytes). The calculation method varies by OS; on Linux, it uses MemAvailable from /proc/ meminfo, or if meminfo cannot be used, it calculates from available memory plus cache and buffers. On OSX, it is the sum of usable memory and inactive memory. On Windows, it is a value such as system.memory.free. |
| Memory | Memory Free [Swap] | bytes | 1m | Available swap memory. |
| Memory | Memory Total | bytes | 1m | total memory |
| Memory | Memory Total [Swap] | bytes | 1m | Total swap memory. |
| Memory | Memory Usage | % | 1m | Percentage of used memory |
| Memory | Memory Usage [Actual] | % | 1m | Percentage of memory actually used |
| Memory | Memory Usage [Cache Swap] | % | 1m | cached swap usage |
| Memory | Memory Usage [Swap] | % | 1m | Percentage of used swap memory |
| Memory | Memory Used | bytes | 1m | used memory |
| Memory | Memory Used [Actual] | bytes | 1m | Actual memory used (bytes). The value obtained by subtracting used memory from total memory. Available memory is calculated differently for each OS (see system.actual.free). |
| Memory | Memory Used [Swap] | bytes | 1m | Used swap memory. |
| Network | Collisions | cnt | 1m | Network collision |
| Network | Network In Bytes | bytes | 1m | Number of received bytes |
| Network | Network In Bytes [Delta Avg] | bytes | 1m | Average of system.network.in.bytes_delta for individual networks |
| Network | Network In Bytes [Delta Max] | bytes | 1m | Maximum of system.network.in.bytes_delta for each network |
| Network | Network In Bytes [Delta Min] | bytes | 1m | Minimum of system.network.in.bytes_delta for each network |
| Network | Network In Bytes [Delta Sum] | bytes | 1m | Sum of system.network.in.bytes_delta for each network |
| Network | Network In Bytes [Delta] | bytes | 1m | Delta of received byte count |
| Network | Network In Dropped | cnt | 1m | Number of deleted packets among incoming packets |
| Network | Network In Errors | cnt | 1m | Number of errors during reception |
| Network | Network In Packets | cnt | 1m | Number of received packets |
| Network | Network In Packets [Delta Avg] | cnt | 1m | Average of system.network.in.packets_delta for each network |
| Network | Network In Packets [Delta Max] | cnt | 1m | Maximum of system.network.in.packets_delta for each network |
| Network | Network In Packets [Delta Min] | cnt | 1m | Minimum of system.network.in.packets_delta for each network |
| Network | Network In Packets [Delta Sum] | cnt | 1m | Sum of system.network.in.packets_delta for each network |
| Network | Network In Packets [Delta] | cnt | 1m | Delta of received packet count |
| Network | Network Out Bytes | bytes | 1m | Number of transmitted bytes |
| Network | Network Out Bytes [Delta Avg] | bytes | 1m | Average of system.network.out.bytes_delta for each network |
| Network | Network Out Bytes [Delta Max] | bytes | 1m | Maximum of system.network.out.bytes_delta for each network |
| Network | Network Out Bytes [Delta Min] | bytes | 1m | Minimum system.network.out.bytes_delta for each network |
| Network | Network Out Bytes [Delta Sum] | bytes | 1m | Sum of system.network.out.bytes_delta of individual networks |
| Network | Network Out Bytes [Delta] | bytes | 1m | Delta of transmitted byte count |
| Network | Network Out Dropped | cnt | 1m | Number of packets deleted among outgoing packets. This value is not reported by the operating system, so it is always 0 on Darwin and BSD. |
| Network | Network Out Errors | cnt | 1m | Number of errors during transmission |
| Network | Network Out Packets | cnt | 1m | Number of transmitted packets |
| Network | Network Out Packets [Delta Avg] | cnt | 1m | Average of system.network.out.packets_delta for each network |
| Network | Network Out Packets [Delta Max] | cnt | 1m | Maximum of system.network.out.packets_delta for each network |
| Network | Network Out Packets [Delta Min] | cnt | 1m | Minimum of system.network.out.packets_delta for each network |
| Network | Network Out Packets [Delta Sum] | cnt | 1m | Sum of system.network.out.packets_delta for each network |
| Network | Network Out Packets [Delta] | cnt | 1m | Delta of transmitted packet count |
| Network | Open Connections [TCP] | cnt | 1m | All open TCP connections |
| Network | Open Connections [UDP] | cnt | 1m | All open UDP connections |
| Network | Port Usage | % | 1m | Usage rate of connectable ports |
| Network | SYN Sent Sockets | cnt | 1m | Number of sockets in SYN_SENT state (when connecting from local to remote) |
| Process | Kernel PID Max | cnt | 1m | kernel.pid_max value |
| Process | Kernel Thread Max | cnt | 1m | kernel.threads-max value |
| Process | Process CPU Usage | % | 1m | The percentage of CPU time consumed by the process since the last update. This value is similar to the %CPU value shown for the process by the top command on Unix systems. |
| Process | Process CPU Usage/Core | % | 1m | The percentage of CPU time used by the process since the last event. Normalized by the number of cores, with a value between 0 and 100%. |
| Process | Process Memory Usage | % | 1m | Proportion of main memory (RAM) occupied by the process |
| Process | Process Memory Used | bytes | 1m | Resident Set size. The amount of memory a process occupies in RAM. In Windows, it is the current working set size. |
| Process | Process PID | PID | 1m | process pid |
| Process | Process PPID | PID | 1m | PID of the parent process |
| Process | Processes [Dead] | cnt | 1m | Number of dead processes |
| Process | Processes [Idle] | cnt | 1m | Number of idle processes |
| Process | Processes [Running] | cnt | 1m | running processes count |
| Process | Processes [Sleeping] | cnt | 1m | sleeping processes count |
| Process | Processes [Stopped] | cnt | 1m | stopped processes count |
| Process | Processes [Total] | cnt | 1m | Total number of processes |
| Process | Processes [Unknown] | cnt | 1m | Number of processes with an unknown or unsearchable status |
| Process | Processes [Zombie] | cnt | 1m | Number of zombie processes |
| Process | Running Process Usage | % | 1m | process usage |
| Process | Running Processes | cnt | 1m | running processes count |
| Process | Running Thread Usage | % | 1m | Thread usage rate |
| Process | Running Threads | cnt | 1m | Total number of threads running in running processes |
| System | Context Switches | cnt | 1m | context switch count (per second) |
| System | Load/Core [1 min] | cnt | 1m | The load over the last 1 minute divided by the number of cores |
| System | Load/Core [15 min] | cnt | 1m | The load over the last 15 minutes divided by the number of cores |
| System | Load/Core [5 min] | cnt | 1m | The load over the last 5 minutes divided by the number of cores |
| System | Multipaths [Active] | cnt | 1m | External storage connection path state = active count |
| System | Multipaths [Failed] | cnt | 1m | External storage connection path state = failed count |
| System | Multipaths [Faulty] | cnt | 1m | External storage connection path state = faulty count |
| System | NTP Offset | num | 1m | the measured offset of the last sample (time difference between the NTP server and the local environment) |
| System | Run Queue Length | num | 1m | Execution queue length |
| System | Uptime | ms | 1m | OS uptime (milliseconds). |
| Windows | Context Switchies | cnt | 1m | CPU context switch count (per second) |
| Windows | Disk Read Bytes [Sec] | cnt | 1m | Bytes read per second on a Windows logical disk |
| Windows | Disk Read Time [Avg] | sec | 1m | Average data read time (seconds) |
| Windows | Disk Transfer Time [Avg] | sec | 1m | Disk average wait time |
| Windows | Disk Usage | % | 1m | Disk usage |
| Windows | Disk Write Bytes [Sec] | cnt | 1m | Number of bytes written in one second on a Windows logical disk |
| Windows | Disk Write Time [Avg] | sec | 1m | Average data write time (seconds) |
| Windows | Pagingfile Usage | % | 1m | Paging file usage |
| Windows | Pool Used [Non Paged] | bytes | 1m | Nonpaged Pool usage in kernel memory |
| Windows | Pool Used [Paged] | bytes | 1m | Paged Pool usage in kernel memory |
| Windows | Process [Running] | cnt | 1m | Number of processes currently running |
| Windows | Threads [Running] | cnt | 1m | Number of threads currently running |
| Windows | Threads [Waiting] | cnt | 1m | Number of threads waiting for processor time |
GPU Server
Agentless (basic metrics)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | description |
|---|---|---|---|---|
| CPU | CPU Usage/Core [Basic] | % | 1m | Percentage of CPU time used, excluding Idle and IOWait states (normalized by the number of cores; 100% when all four cores are fully utilized) |
| CPU | CPU Cores [Basic] | cnt | 1m | Number of virtual processor cores allocated to the virtual machine |
| Memory | Memory Total [Basic] | bytes | 1m | Memory capacity available in the domain |
| Memory | Memory Used [Basic] | bytes | 1m | The amount of memory currently in use |
| Memory | Memory Swap In [Basic] | bytes | 1m | Swap In memory in bytes |
| Memory | Memory Swap Out [Basic] | bytes | 1m | Swap Out memory in bytes |
| Memory | Memory Free [Bytes] | bytes | 1m | Unused memory capacity in the system |
| Memory | Memory Usage [Basic] | % | 1m | Current memory usage rate |
| Disk | Disk Read Bytes [Basic] | bytes | 1m | Read byte count |
| Disk | Disk Read Requests [Basic] | cnt | 1m | Read request count |
| Disk | Disk Write Bytes [Basic] | bytes | 1m | Write byte count |
| Disk | Disk Write Requests [Basic] | cnt | 1m | Number of write requests |
| State | Instance State [Basic] | enum | 1m | VM status |
| Network | Network In Bytes [Basic] | bytes | 1m | Received bytes |
| Network | Network In Dropped [Basic] | cnt | 1m | Incoming packet drop |
| Network | Network In Errors [Basic] | cnt | 1m | Receive error |
| Network | Network In Packets [Basic] | cnt | 1m | Received packet |
| Network | Network Out Bytes [Basic] | bytes | 1m | Transmit bytes |
| Network | Network Out Dropped [Basic] | cnt | 1m | Transmit packet drop |
| Network | Network Out Errors [Basic] | cnt | 1m | Transmission error |
| Network | Network Out Packets [Basic] | cnt | 1m | transmitted packet |
| Network | Network In Bytes [Delta Basic] | bytes | 1m | Received bytes (delta value) |
| Network | Network In Dropped [Delta Basic] | cnt | 1m | Received packet drop (delta value) |
| Network | Network In Errors [Delta Basic] | cnt | 1m | Receive error (delta value) |
| Network | Network In Packets [Delta Basic] | cnt | 1m | Received packet (delta value) |
| Network | Network Out Bytes [Delta Basic] | bytes | 1m | Transmitted bytes (delta value) |
| Network | Network Out Dropped [Delta Basic] | cnt | 1m | Transmit packet drop (delta value) |
| Network | Network Out Errors [Delta Basic] | cnt | 1m | Transmission error (delta value) |
| Network | Network Out Packets [Delta Basic] | cnt | 1m | Transmitted packet (delta value) |
Agent (Detailed Metrics)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | description |
|---|---|---|---|---|
| GPU | GPU Count | cnt | 1m | GPU count |
| GPU | GPU Memory Usage | % | 1m | Memory usage |
| GPU | GPU Memory Used | bytes | 1m | Memory usage |
| GPU | GPU Temperature | ℃ | 1m | GPU temperature |
| GPU | GPU Usage | % | 1m | Total GPU utilization sum (800% when all 8 GPUs are used at 100%) |
| GPU | GPU Usage [Avg] | % | 1m | Overall average GPU utilization (%) |
| GPU | GPU Power Cap | W | 1m | Maximum power capacity of the GPU |
| GPU | GPU Power Usage | W | 1m | Current GPU power usage |
| GPU | GPU Memory Usage [Avg] | % | 1m | GPU Memory Uti. AVG |
| GPU | GPU Count in use | cnt | 1m | Number of GPUs currently utilized by jobs on the node |
| GPU | Execution State for nvidia-smi | state | 1m | Result of running the nvidia-smi command |
| CPU | Core Usage [IO Wait] | % | 1m | Ratio of CPU time spent in wait state (disk wait) |
| CPU | Core Usage [System] | % | 1m | Proportion of CPU time spent in kernel space |
| CPU | Core Usage [User] | % | 1m | Proportion of CPU time spent in user space |
| CPU | CPU Cores | cnt | 1m | The number of CPU cores on the host. The maximum value of the unnormalized ratio is 100%* of a core. The unnormalized ratio already incorporates this value, and the maximum value is 100%* of a core. |
| CPU | CPU Usage [Active] | % | 1m | Percentage of CPU time used excluding Idle and IOWait states (when all four cores are used at 100%: 400%) |
| CPU | CPU Usage [Idle] | % | 1m | It is the proportion of CPU time spent in idle state. |
| CPU | CPU Usage [IO Wait] | % | 1m | It is the proportion of CPU time spent in a waiting state (disk wait). |
| CPU | CPU Usage [System] | % | 1m | CPU time usage percentage in the kernel (when all 4 cores are used at 100%: 400%) |
| CPU | CPU Usage [User] | % | 1m | Percentage of CPU time used in user space. (If all 4 cores are used at 100%, it is 400%) |
| CPU | CPU Usage/Core [Active] | % | 1m | Percentage of CPU time used excluding Idle and IOWait states (value normalized by the number of cores; 100% when all four cores are fully utilized) |
| CPU | CPU Usage/Core [Idle] | % | 1m | It is the proportion of CPU time spent in idle state. |
| CPU | CPU Usage/Core [IO Wait] | % | 1m | It is the proportion of CPU time spent in a waiting state (disk wait). |
| CPU | CPU Usage/Core [System] | % | 1m | Percentage of CPU time used by the kernel (value normalized by the number of cores; 100% when all four cores are utilized at 100%) |
| CPU | CPU Usage/Core [User] | % | 1m | Percentage of CPU time used in user space. (Value normalized by the number of cores; using all four cores at 100% equals 100%) |
| Disk | Disk CPU Usage [IO Request] | % | 1m | The proportion of CPU time during which I/O requests to the device were executed (device bandwidth utilization). If this value approaches 100%, the device becomes saturated. |
| Disk | Disk Queue Size [Avg] | num | 1m | The average queue length of requests executed for the device. |
| Disk | Disk Read Bytes | bytes | 1m | The number of bytes read per second from the device. |
| Disk | Disk Read Bytes [Delta Avg] | bytes | 1m | Average of system.diskio.read.bytes_delta for individual disks |
| Disk | Disk Read Bytes [Delta Max] | bytes | 1m | Maximum system.diskio.read.bytes_delta of individual disks |
| Disk | Disk Read Bytes [Delta Min] | bytes | 1m | Minimum of system.diskio.read.bytes_delta for individual disks |
| Disk | Disk Read Bytes [Delta Sum] | bytes | 1m | Sum of system.diskio.read.bytes_delta for individual disks |
| Disk | Disk Read Bytes [Delta] | bytes | 1m | Delta of the system.diskio.read.bytes value for each disk |
| Disk | Disk Read Bytes [Success] | bytes | 1m | Total number of bytes successfully read. On Linux, assuming a sector size of 512, it is the number of sectors read multiplied by 512. |
| Disk | Disk Read Requests | cnt | 1m | Number of read requests to the disk device per second |
| Disk | Disk Read Requests [Delta Avg] | cnt | 1m | Average of system.diskio.read.count_delta for individual disks |
| Disk | Disk Read Requests [Delta Max] | cnt | 1m | Maximum system.diskio.read.count_delta of individual disks |
| Disk | Disk Read Requests [Delta Min] | cnt | 1m | Minimum of system.diskio.read.count_delta for individual disks |
| Disk | Disk Read Requests [Delta Sum] | cnt | 1m | Sum of system.diskio.read.count_delta for individual disks |
| Disk | Disk Read Requests [Success Delta] | cnt | 1m | Delta of system.diskio.read.count for each disk |
| Disk | Disk Read Requests [Success] | cnt | 1m | Total number of successful reads |
| Disk | Disk Request Size [Avg] | num | 1m | Average size of requests executed on the device (unit: sectors). |
| Disk | Disk Service Time [Avg] | ms | 1m | Average service time (ms) of input requests executed on the device. |
| Disk | Disk Wait Time [Avg] | ms | 1m | Average time taken for requests executed on the supported device. |
| Disk | Disk Wait Time [Read] | ms | 1m | Average disk wait time |
| Disk | Disk Wait Time [Write] | ms | 1m | Average disk wait time |
| Disk | Disk Write Bytes [Delta Avg] | bytes | 1m | Average of system.diskio.write.bytes_delta for individual disks |
| Disk | Disk Write Bytes [Delta Max] | bytes | 1m | Maximum system.diskio.write.bytes_delta of individual disks |
| Disk | Disk Write Bytes [Delta Min] | bytes | 1m | Minimum of system.diskio.write.bytes_delta for individual disks |
| Disk | Disk Write Bytes [Delta Sum] | bytes | 1m | Sum of system.diskio.write.bytes_delta for individual disks |
| Disk | Disk Write Bytes [Delta] | bytes | 1m | Delta of the system.diskio.write.bytes value for each disk |
| Disk | Disk Write Bytes [Success] | bytes | 1m | Total number of bytes successfully written. On Linux, assuming a sector size of 512, it is the number of sectors written multiplied by 512. |
| Disk | Disk Write Requests | cnt | 1m | Number of write requests to the disk device per second |
| Disk | Disk Write Requests [Delta Avg] | cnt | 1m | Average of system.diskio.write.count_delta for individual disks |
| Disk | Disk Write Requests [Delta Max] | cnt | 1m | Maximum of system.diskio.write.count_delta for individual disks |
| Disk | Disk Write Requests [Delta Min] | cnt | 1m | Minimum of system.diskio.write.count_delta for individual disks |
| Disk | Disk Write Requests [Delta Sum] | cnt | 1m | Sum of system.diskio.write.count_delta for individual disks |
| Disk | Disk Write Requests [Success Delta] | cnt | 1m | Delta of system.diskio.write.count for each disk |
| Disk | Disk Write Requests [Success] | cnt | 1m | Total number of successful writes |
| Disk | Disk Writes Bytes | bytes | 1m | The number of bytes per second written to the device. |
| FileSystem | Filesystem Hang Check | state | 1m | filesystem(local/NFS) hang check (normal:1, abnormal:0) |
| FileSystem | Filesystem Nodes | cnt | 1m | Total number of file nodes in the file system. |
| FileSystem | Filesystem Nodes [Free] | cnt | 1m | Total number of available file nodes in the file system. |
| FileSystem | Filesystem Size [Available] | bytes | 1m | Disk space (bytes) that an unauthorized user can use. |
| FileSystem | Filesystem Size [Free] | bytes | 1m | Available disk space (bytes) |
| FileSystem | Filesystem Size [Total] | bytes | 1m | Total disk space (bytes) |
| FileSystem | Filesystem Usage | % | 1m | Used disk space percentage |
| FileSystem | Filesystem Usage [Avg] | % | 1m | Average of individual filesystem.used.pct |
| FileSystem | Filesystem Usage [Inode] | % | 1m | inode usage |
| FileSystem | Filesystem Usage [Max] | % | 1m | max among individual filesystem.used.pct |
| FileSystem | Filesystem Usage [Min] | % | 1m | minimum among individual filesystem.used.pct |
| FileSystem | Filesystem Usage [Total] | % | 1m | - |
| FileSystem | Filesystem Used | bytes | 1m | Used disk space (bytes) |
| FileSystem | Filesystem Used [Inode] | bytes | 1m | inode usage |
| Memory | Memory Free | bytes | 1m | Total amount of available memory (bytes). Does not include memory used by system cache and buffers (see system.memory.actual.free). |
| Memory | Memory Free [Actual] | bytes | 1m | Actual usable memory (bytes). The calculation method varies by OS; on Linux, it is MemAvailable from /proc/meminfo, or if meminfo is unavailable, it is calculated from available memory plus cache and buffers. On macOS, it is the sum of usable memory and inactive memory. On Windows, it is a value such as system.memory.free. |
| Memory | Memory Free [Swap] | bytes | 1m | Available swap memory. |
| Memory | Memory Total | bytes | 1m | total memory |
| Memory | Memory Total [Swap] | bytes | 1m | Total swap memory. |
| Memory | Memory Usage | % | 1m | Percentage of used memory |
| Memory | Memory Usage [Actual] | % | 1m | Percentage of memory actually used |
| Memory | Memory Usage [Cache Swap] | % | 1m | Cached swap usage |
| Memory | Memory Usage [Swap] | % | 1m | Percentage of used swap memory |
| Memory | Memory Used | bytes | 1m | used memory |
| Memory | Memory Used [Actual] | bytes | 1m | Actual memory used (bytes). The value obtained by subtracting used memory from total memory. Available memory is calculated differently for each OS (see system.actual.free). |
| Memory | Memory Used [Swap] | bytes | 1m | Used swap memory. |
| Network | Collisions | cnt | 1m | Network collision |
| Network | Network In Bytes | bytes | 1m | Number of received bytes |
| Network | Network In Bytes [Delta Avg] | bytes | 1m | Average of system.network.in.bytes_delta for individual networks |
| Network | Network In Bytes [Delta Max] | bytes | 1m | Maximum of system.network.in.bytes_delta for each network |
| Network | Network In Bytes [Delta Min] | bytes | 1m | Minimum of system.network.in.bytes_delta for each network |
| Network | Network In Bytes [Delta Sum] | bytes | 1m | Sum of system.network.in.bytes_delta for each network |
| Network | Network In Bytes [Delta] | bytes | 1m | Delta of received byte count |
| Network | Network In Dropped | cnt | 1m | Number of deleted packets among incoming packets |
| Network | Network In Errors | cnt | 1m | Number of errors during reception |
| Network | Network In Packets | cnt | 1m | Number of received packets |
| Network | Network In Packets [Delta Avg] | cnt | 1m | Average of system.network.in.packets_delta for each network |
| Network | Network In Packets [Delta Max] | cnt | 1m | Maximum of system.network.in.packets_delta for each network |
| Network | Network In Packets [Delta Min] | cnt | 1m | Minimum of system.network.in.packets_delta for each network |
| Network | Network In Packets [Delta Sum] | cnt | 1m | Sum of system.network.in.packets_delta for each network |
| Network | Network In Packets [Delta] | cnt | 1m | Delta of received packet count |
| Network | Network Out Bytes | bytes | 1m | Number of transmitted bytes |
| Network | Network Out Bytes [Delta Avg] | bytes | 1m | Average of system.network.out.bytes_delta for each network |
| Network | Network Out Bytes [Delta Max] | bytes | 1m | Maximum of system.network.out.bytes_delta for each network |
| Network | Network Out Bytes [Delta Min] | bytes | 1m | Minimum of system.network.out.bytes_delta for each network |
| Network | Network Out Bytes [Delta Sum] | bytes | 1m | Sum of system.network.out.bytes_delta for individual networks |
| Network | Network Out Bytes [Delta] | bytes | 1m | Delta of transmitted byte count |
| Network | Network Out Dropped | cnt | 1m | Number of packets deleted among outgoing packets. This value is not reported by the operating system, so it is always 0 on Darwin and BSD. |
| Network | Network Out Errors | cnt | 1m | Number of errors during transmission |
| Network | Network Out Packets | cnt | 1m | Number of transmitted packets |
| Network | Network Out Packets [Delta Avg] | cnt | 1m | Average of system.network.out.packets_delta for each network |
| Network | Network Out Packets [Delta Max] | cnt | 1m | Maximum of system.network.out.packets_delta for each network |
| Network | Network Out Packets [Delta Min] | cnt | 1m | Minimum of system.network.out.packets_delta for each network |
| Network | Network Out Packets [Delta Sum] | cnt | 1m | Sum of system.network.out.packets_delta for each network |
| Network | Network Out Packets [Delta] | cnt | 1m | Delta of transmitted packet count |
| Network | Open Connections [TCP] | cnt | 1m | All open TCP connections |
| Network | Open Connections [UDP] | cnt | 1m | All open UDP connections |
| Network | Port Usage | % | 1m | Connectable port utilization |
| Network | SYN Sent Sockets | cnt | 1m | Number of sockets in SYN_SENT state (when connecting from local to remote) |
| Process | Kernel PID Max | cnt | 1m | kernel.pid_max value |
| Process | Kernel Thread Max | cnt | 1m | kernel.threads-max value |
| Process | Process CPU Usage | % | 1m | The percentage of CPU time consumed by the process since the last update. This value is similar to the %CPU value shown for the process by the top command on Unix systems. |
| Process | Process CPU Usage/Core | % | 1m | The percentage of CPU time used by the process since the last event. Normalized by the number of cores, with a value between 0 and 100%. |
| Process | Process Memory Usage | % | 1m | The proportion of main memory (RAM) occupied by the process |
| Process | Process Memory Used | bytes | 1m | Resident Set size. The amount of memory a process occupies in RAM. In Windows, it is the current working set size. |
| Process | Process PID | PID | 1m | process pid |
| Process | Process PPID | PID | 1m | Parent process PID |
| Process | Processes [Dead] | cnt | 1m | Number of dead processes |
| Process | Processes [Idle] | cnt | 1m | Number of idle processes |
| Process | Processes [Running] | cnt | 1m | running processes count |
| Process | Processes [Sleeping] | cnt | 1m | sleeping processes count |
| Process | Processes [Stopped] | cnt | 1m | stopped processes count |
| Process | Processes [Total] | cnt | 1m | Total number of processes |
| Process | Processes [Unknown] | cnt | 1m | Number of processes with an unknown or unsearchable status |
| Process | Processes [Zombie] | cnt | 1m | Number of zombie processes |
| Process | Running Process Usage | % | 1m | process usage rate |
| Process | Running Processes | cnt | 1m | running processes count |
| Process | Running Thread Usage | % | 1m | Thread usage rate |
| Process | Running Threads | cnt | 1m | Total number of threads running in running processes |
| System | Context Switches | cnt | 1m | context switch count (per second) |
| System | Load/Core [1 min] | cnt | 1m | The load over the last 1 minute divided by the number of cores |
| System | Load/Core [15 min] | cnt | 1m | The load over the last 15 minutes divided by the number of cores |
| System | Load/Core [5 min] | cnt | 1m | The load over the last 5 minutes divided by the number of cores |
| System | Multipaths [Active] | cnt | 1m | External storage connection path state = active count |
| System | Multipaths [Failed] | cnt | 1m | External storage connection path state = failed count |
| System | Multipaths [Faulty] | cnt | 1m | External storage connection path state = faulty count |
| System | NTP Offset | num | 1m | the measured offset of the last sample (time difference between the NTP server and the local environment) |
| System | Run Queue Length | num | 1m | Execution queue length |
| System | Uptime | ms | 1m | OS uptime (uptime). (milliseconds) |
| Windows | Context Switchies | cnt | 1m | CPU context switch count (per second) |
| Windows | Disk Read Bytes [Sec] | cnt | 1m | Number of bytes read in one second from a Windows logical disk |
| Windows | Disk Read Time [Avg] | sec | 1m | Average data read time (seconds) |
| Windows | Disk Transfer Time [Avg] | sec | 1m | Disk average wait time |
| Windows | Disk Usage | % | 1m | Disk usage |
| Windows | Disk Write Bytes [Sec] | cnt | 1m | Bytes written per second on a Windows logical disk |
| Windows | Disk Write Time [Avg] | sec | 1m | Average data write time (seconds) |
| Windows | Pagingfile Usage | % | 1m | Paging file usage |
| Windows | Pool Used [Non Paged] | bytes | 1m | Nonpaged Pool usage in kernel memory |
| Windows | Pool Used [Paged] | bytes | 1m | Paged Pool usage in kernel memory |
| Windows | Process [Running] | cnt | 1m | Number of processes currently running |
| Windows | Threads [Running] | cnt | 1m | Number of threads currently running |
| Windows | Threads [Waiting] | cnt | 1m | Number of threads waiting for processor time |
Bare Metal Server
Agent (detailed metrics)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| CPU | Core Usage [IO Wait] | % | 1m | Proportion of CPU time spent waiting (disk wait) |
| CPU | Core Usage [System] | % | 1m | Proportion of CPU time spent in kernel space |
| CPU | Core Usage [User] | % | 1m | Proportion of CPU time spent in user space |
| CPU | CPU Cores | cnt | 1m | The number of CPU cores on the host. The maximum value of the unnormalized ratio is 100%* of a core. The unnormalized ratio already incorporates this value, and its maximum is 100%* of a core. |
| CPU | CPU Usage [Active] | % | 1m | Percentage of CPU time used excluding Idle and IOWait states (when all 4 cores are used at 100%: 400%) |
| CPU | CPU Usage [Idle] | % | 1m | It is the proportion of CPU time spent in idle state. |
| CPU | CPU Usage [IO Wait] | % | 1m | It is the proportion of CPU time spent in a waiting state (disk wait). |
| CPU | CPU Usage [System] | % | 1m | Percentage of CPU time used by the kernel (when all 4 cores are used at 100%: 400%) |
| CPU | CPU Usage [User] | % | 1m | Percentage of CPU time used in user space. (If all 4 cores are used at 100%, it is 400%) |
| CPU | CPU Usage/Core [Active] | % | 1m | Percentage of CPU time used excluding Idle and IOWait states (value normalized by the number of cores; 100% when all four cores are fully utilized) |
| CPU | CPU Usage/Core [Idle] | % | 1m | It is the proportion of CPU time spent in idle state. |
| CPU | CPU Usage/Core [IO Wait] | % | 1m | It is the proportion of CPU time spent in a waiting state (disk wait). |
| CPU | CPU Usage/Core [System] | % | 1m | Percentage of CPU time used by the kernel (value normalized by the number of cores; 100% when all 4 cores are fully utilized) |
| CPU | CPU Usage/Core [User] | % | 1m | Percentage of CPU time used in user space. (Value normalized by the number of cores; using all four cores at 100% each equals 100%) |
| Disk | Disk CPU Usage [IO Request] | % | 1m | The proportion of CPU time during which I/O requests to the device were executed (device bandwidth utilization). If this value approaches 100%, the device becomes saturated. |
| Disk | Disk Queue Size [Avg] | num | 1m | The average queue length of requests executed for the device. |
| Disk | Disk Read Bytes | bytes | 1m | The number of bytes read per second from the device. |
| Disk | Disk Read Bytes [Delta Avg] | bytes | 1m | Average of system.diskio.read.bytes_delta for individual disks |
| Disk | Disk Read Bytes [Delta Max] | bytes | 1m | Maximum system.diskio.read.bytes_delta of individual disks |
| Disk | Disk Read Bytes [Delta Min] | bytes | 1m | Minimum of system.diskio.read.bytes_delta for individual disks |
| Disk | Disk Read Bytes [Delta Sum] | bytes | 1m | Sum of system.diskio.read.bytes_delta for individual disks |
| Disk | Disk Read Bytes [Delta] | bytes | 1m | Delta of the system.diskio.read.bytes value for each disk |
| Disk | Disk Read Bytes [Success] | bytes | 1m | Total number of bytes successfully read. On Linux, the sector size is assumed to be 512, and the value is the number of sectors read multiplied by 512. |
| Disk | Disk Read Requests | cnt | 1m | Number of read requests to the disk device per second |
| Disk | Disk Read Requests [Delta Avg] | cnt | 1m | Average of the system.diskio.read.count_delta for individual disks |
| Disk | Disk Read Requests [Delta Max] | cnt | 1m | Maximum system.diskio.read.count_delta of individual disks |
| Disk | Disk Read Requests [Delta Min] | cnt | 1m | Minimum of system.diskio.read.count_delta for individual disks |
| Disk | Disk Read Requests [Delta Sum] | cnt | 1m | Sum of system.diskio.read.count_delta for individual disks |
| Disk | Disk Read Requests [Success Delta] | cnt | 1m | Delta of system.diskio.read.count for each disk |
| Disk | Disk Read Requests [Success] | cnt | 1m | Total number of successful reads |
| Disk | Disk Request Size [Avg] | num | 1m | It is the average size of requests executed on the device (unit: sectors). |
| Disk | Disk Service Time [Avg] | ms | 1m | Average service time (ms) of input requests executed on the device. |
| Disk | Disk Wait Time [Avg] | ms | 1m | Average time taken for requests executed on the supported device. |
| Disk | Disk Wait Time [Read] | ms | 1m | Average disk wait time |
| Disk | Disk Wait Time [Write] | ms | 1m | Average disk wait time |
| Disk | Disk Write Bytes [Delta Avg] | bytes | 1m | Average of system.diskio.write.bytes_delta for individual disks |
| Disk | Disk Write Bytes [Delta Max] | bytes | 1m | Maximum system.diskio.write.bytes_delta of individual disks |
| Disk | Disk Write Bytes [Delta Min] | bytes | 1m | Minimum of system.diskio.write.bytes_delta for individual disks |
| Disk | Disk Write Bytes [Delta Sum] | bytes | 1m | Sum of system.diskio.write.bytes_delta for individual disks |
| Disk | Disk Write Bytes [Delta] | bytes | 1m | Delta of the system.diskio.write.bytes value for each disk |
| Disk | Disk Write Bytes [Success] | bytes | 1m | Total number of bytes successfully written. On Linux, the sector size is assumed to be 512, and the value is the number of sectors written multiplied by 512. |
| Disk | Disk Write Requests | cnt | 1m | Number of write requests to the disk device per second |
| Disk | Disk Write Requests [Delta Avg] | cnt | 1m | Average of system.diskio.write.count_delta for individual disks |
| Disk | Disk Write Requests [Delta Max] | cnt | 1m | Maximum of system.diskio.write.count_delta for individual disks |
| Disk | Disk Write Requests [Delta Min] | cnt | 1m | Minimum of system.diskio.write.count_delta for individual disks |
| Disk | Disk Write Requests [Delta Sum] | cnt | 1m | Sum of system.diskio.write.count_delta for individual disks |
| Disk | Disk Write Requests [Success Delta] | cnt | 1m | Delta of system.diskio.write.count for each disk |
| Disk | Disk Write Requests [Success] | cnt | 1m | Total number of successful writes |
| Disk | Disk Writes Bytes | bytes | 1m | The number of bytes per second written to the device. |
| FileSystem | Filesystem Hang Check | state | 1m | filesystem(local/NFS) hang check (normal:1, abnormal:0) |
| FileSystem | Filesystem Nodes | cnt | 1m | Total number of file nodes in the file system. |
| FileSystem | Filesystem Nodes [Free] | cnt | 1m | Total number of available file nodes in the file system. |
| FileSystem | Filesystem Size [Available] | bytes | 1m | Disk space (bytes) that an unauthorized user can use. |
| FileSystem | Filesystem Size [Free] | bytes | 1m | Available disk space (bytes) |
| FileSystem | Filesystem Size [Total] | bytes | 1m | Total disk space (bytes) |
| FileSystem | Filesystem Usage | % | 1m | Used disk space percentage |
| FileSystem | Filesystem Usage [Avg] | % | 1m | Average of individual filesystem.used.pct |
| FileSystem | Filesystem Usage [Inode] | % | 1m | inode usage |
| FileSystem | Filesystem Usage [Max] | % | 1m | max among individual filesystem.used.pct |
| FileSystem | Filesystem Usage [Min] | % | 1m | minimum among individual filesystem.used.pct |
| FileSystem | Filesystem Usage [Total] | % | 1m | - |
| FileSystem | Filesystem Used | bytes | 1m | Used disk space (bytes) |
| FileSystem | Filesystem Used [Inode] | bytes | 1m | inode usage |
| Memory | Memory Free | bytes | 1m | Total amount of available memory (bytes). Does not include memory used by system cache and buffers (see system.memory.actual.free). |
| Memory | Memory Free [Actual] | bytes | 1m | Actual usable memory (bytes). The calculation method varies by OS; on Linux, it is MemAvailable from /proc/ meminfo, or if meminfo cannot be used, it is calculated from available memory plus cache and buffers. On macOS, it is the sum of usable memory and inactive memory. On Windows, it is a value such as system.memory.free. |
| Memory | Memory Free [Swap] | bytes | 1m | Available swap memory. |
| Memory | Memory Total | bytes | 1m | total memory |
| Memory | Memory Total [Swap] | bytes | 1m | Total swap memory. |
| Memory | Memory Usage | % | 1m | Percentage of used memory |
| Memory | Memory Usage [Actual] | % | 1m | Percentage of memory actually used |
| Memory | Memory Usage [Cache Swap] | % | 1m | Cached swap usage |
| Memory | Memory Usage [Swap] | % | 1m | Percentage of used swap memory |
| Memory | Memory Used | bytes | 1m | used memory |
| Memory | Memory Used [Actual] | bytes | 1m | Actual memory used (bytes). The value obtained by subtracting used memory from total memory. Available memory is calculated differently for each OS (see system.actual.free). |
| Memory | Memory Used [Swap] | bytes | 1m | Used swap memory. |
| Network | Collisions | cnt | 1m | Network collision |
| Network | Network In Bytes | bytes | 1m | Number of received bytes |
| Network | Network In Bytes [Delta Avg] | bytes | 1m | Average of system.network.in.bytes_delta for individual networks |
| Network | Network In Bytes [Delta Max] | bytes | 1m | Maximum of system.network.in.bytes_delta for each network |
| Network | Network In Bytes [Delta Min] | bytes | 1m | Minimum of system.network.in.bytes_delta for each network |
| Network | Network In Bytes [Delta Sum] | bytes | 1m | Sum of system.network.in.bytes_delta for each network |
| Network | Network In Bytes [Delta] | bytes | 1m | Delta of received byte count |
| Network | Network In Dropped | cnt | 1m | Number of deleted packets among incoming packets |
| Network | Network In Errors | cnt | 1m | Number of errors during reception |
| Network | Network In Packets | cnt | 1m | Number of received packets |
| Network | Network In Packets [Delta Avg] | cnt | 1m | Average of system.network.in.packets_delta for each network |
| Network | Network In Packets [Delta Max] | cnt | 1m | Maximum of system.network.in.packets_delta for each network |
| Network | Network In Packets [Delta Min] | cnt | 1m | Minimum of system.network.in.packets_delta for each network |
| Network | Network In Packets [Delta Sum] | cnt | 1m | Sum of system.network.in.packets_delta for each network |
| Network | Network In Packets [Delta] | cnt | 1m | Delta of received packet count |
| Network | Network Out Bytes | bytes | 1m | Number of transmitted bytes |
| Network | Network Out Bytes [Delta Avg] | bytes | 1m | Average of system.network.out.bytes_delta for each network |
| Network | Network Out Bytes [Delta Max] | bytes | 1m | Maximum of system.network.out.bytes_delta for each network |
| Network | Network Out Bytes [Delta Min] | bytes | 1m | Minimum of system.network.out.bytes_delta for each network |
| Network | Network Out Bytes [Delta Sum] | bytes | 1m | Sum of system.network.out.bytes_delta for individual networks |
| Network | Network Out Bytes [Delta] | bytes | 1m | Delta of transmitted byte count |
| Network | Network Out Dropped | cnt | 1m | Number of deleted packets among outgoing packets. This value is not reported by the operating system, so it is always 0 on Darwin and BSD. |
| Network | Network Out Errors | cnt | 1m | Number of errors during transmission |
| Network | Network Out Packets | cnt | 1m | Number of transmitted packets |
| Network | Network Out Packets [Delta Avg] | cnt | 1m | Average of system.network.out.packets_delta for each network |
| Network | Network Out Packets [Delta Max] | cnt | 1m | Maximum of system.network.out.packets_delta for each network |
| Network | Network Out Packets [Delta Min] | cnt | 1m | Minimum of system.network.out.packets_delta for each network |
| Network | Network Out Packets [Delta Sum] | cnt | 1m | Sum of system.network.out.packets_delta for each network |
| Network | Network Out Packets [Delta] | cnt | 1m | Delta of transmitted packet count |
| Network | Open Connections [TCP] | cnt | 1m | All open TCP connections |
| Network | Open Connections [UDP] | cnt | 1m | All open UDP connections |
| Network | Port Usage | % | 1m | Connectable port utilization |
| Network | SYN Sent Sockets | cnt | 1m | Number of sockets in SYN_SENT state (when connecting from local to remote) |
| Process | Kernel PID Max | cnt | 1m | kernel.pid_max value |
| Process | Kernel Thread Max | cnt | 1m | kernel.threads-max value |
| Process | Process CPU Usage | % | 1m | The percentage of CPU time consumed by the process since the last update. This value is similar to the %CPU value shown for the process by the top command on Unix systems. |
| Process | Process CPU Usage/Core | % | 1m | The percentage of CPU time used by the process since the last event. Normalized by the number of cores, with a value between 0 and 100%. |
| Process | Process Memory Usage | % | 1m | The proportion of main memory (RAM) occupied by the process |
| Process | Process Memory Used | bytes | 1m | Resident Set size. The amount of memory a process occupies in RAM. In Windows, it is the current working set size. |
| Process | Process PID | PID | 1m | process pid |
| Process | Process PPID | PID | 1m | Parent process PID |
| Process | Processes [Dead] | cnt | 1m | Number of dead processes |
| Process | Processes [Idle] | cnt | 1m | Number of idle processes |
| Process | Processes [Running] | cnt | 1m | running processes count |
| Process | Processes [Sleeping] | cnt | 1m | sleeping processes count |
| Process | Processes [Stopped] | cnt | 1m | stopped processes count |
| Process | Processes [Total] | cnt | 1m | Total number of processes |
| Process | Processes [Unknown] | cnt | 1m | Number of processes with an unknown or unsearchable status |
| Process | Processes [Zombie] | cnt | 1m | Number of zombie processes |
| Process | Running Process Usage | % | 1m | process usage rate |
| Process | Running Processes | cnt | 1m | running processes count |
| Process | Running Thread Usage | % | 1m | Thread usage rate |
| Process | Running Threads | cnt | 1m | Total number of threads running in running processes |
| System | Context Switches | cnt | 1m | context switch count (per second) |
| System | Load/Core [1 min] | cnt | 1m | The load over the last 1 minute divided by the number of cores |
| System | Load/Core [15 min] | cnt | 1m | The load over the last 15 minutes divided by the number of cores |
| System | Load/Core [5 min] | cnt | 1m | The load over the last 5 minutes divided by the number of cores |
| System | Multipaths [Active] | cnt | 1m | External storage connection path state = active count |
| System | Multipaths [Failed] | cnt | 1m | External storage connection path state = failed count |
| System | Multipaths [Faulty] | cnt | 1m | External storage connection path state = faulty count |
| System | NTP Offset | num | 1m | the measured offset of the last sample (time difference between the NTP server and the local environment) |
| System | Run Queue Length | num | 1m | Execution queue length |
| System | Uptime | ms | 1m | OS uptime (uptime). (milliseconds) |
| Windows | Context Switchies | cnt | 1m | CPU context switch count (per second) |
| Windows | Disk Read Bytes [Sec] | cnt | 1m | Number of bytes read in one second from a Windows logical disk |
| Windows | Disk Read Time [Avg] | sec | 1m | Average data read time (seconds) |
| Windows | Disk Transfer Time [Avg] | sec | 1m | Disk average wait time |
| Windows | Disk Usage | % | 1m | Disk usage |
| Windows | Disk Write Bytes [Sec] | cnt | 1m | Bytes written per second on a Windows logical disk |
| Windows | Disk Write Time [Avg] | sec | 1m | Average data write time (seconds) |
| Windows | Pagingfile Usage | % | 1m | Paging file usage |
| Windows | Pool Used [Non Paged] | bytes | 1m | Nonpaged Pool usage in kernel memory |
| Windows | Pool Used [Paged] | bytes | 1m | Paged Pool usage in kernel memory |
| Windows | Process [Running] | cnt | 1m | Number of processes currently running |
| Windows | Threads [Running] | cnt | 1m | Number of threads currently running |
| Windows | Threads [Waiting] | cnt | 1m | Number of threads waiting for processor time |
Multi-node GPU Cluster [Cluster Fabric]
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Cluster GPU | Cluster GPU Count | cnt | 1m | Cluster GPU Count Sum. Sum of node GPU Count within the cluster: calculate the total GPU Count of each node within the same GPU cluster. |
| Cluster GPU | Cluster GPU Count In Use | cnt | 1m | Number of GPUs being used by Jobs in the cluster Number of GPUs used by Processes in the cluster: Parse the ‘Processes:’ section at the bottom of nvidia-smi output from nodes in the same GPU cluster and sum the number of GPUs held by processes |
| Cluster GPU | Cluster GPU Usage | % | 1m | GPU Utilization Average within the cluster. GPU Utilization Average value for nodes within the cluster: calculate the average of each node’s GPU Utilization values among nodes in the same GPU cluster. |
| Cluster GPU | Cluster GPU Memory Usage [Avg] | % | 1m | GPU Memory Utilization Average within the Song cluster. Cluster node Memory Utilization Average value: calculates the average of each node’s Memory Utilization values among nodes in the same GPU cluster. |
Multi-node GPU Cluster [Node]
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| GPU | GPU Count | cnt | 1m | Number of GPUs |
| GPU | GPU Memory Usage | % | 1m | Memory usage |
| GPU | GPU Memory Used | MB | 1m | Memory usage |
| GPU | GPU Temperature | ℃ | 1m | GPU temperature |
| GPU | GPU Usage | % | 1m | Utilization |
| GPU | GPU Usage [Avg] | % | 1m | Overall average GPU utilization (%) |
| GPU | GPU Power Cap | W | 1m | Maximum power capacity of the GPU |
| GPU | GPU Power Usage | W | 1m | Current GPU power usage |
| GPU | GPU Memory Usage [Avg] | % | 1m | GPU Memory Utilization Average |
| GPU | GPU Count in use | cnt | 1m | Number of GPUs in use by jobs on the node |
| GPU | Execution State for nvidia-smi | state | 1m | Result of running the nvidia-smi command |
Storage type
File Storage
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Volume | Instance State | state | 1m | filestorage volume status |
| Volume | IOPS [Other] | iops | 1m | iops (other) |
| Volume | IOPS [Read] | iops | 1m | iops(read) |
| Volume | IOPS [Total] | iops | 1m | iops(total) |
| Volume | IOPS [Write] | iops | 1m | iops(write) |
| Volume | Latency Time [Other] | usec | 1m | Latency (Other) |
| Volume | Latency Time [Read] | usec | 1m | Read latency |
| Volume | Latency Time [Total] | usec | 1m | Total latency |
| Volume | Latency Time [write] | usec | 1m | Write latency |
| Volume | Throughput [Other] | bytes/s | 1m | Throughput (Other) |
| Volume | Throughput [Read] | bytes/s | 1m | Throughput (read) |
| Volume | Throughput [Total] | bytes/s | 1m | Throughput (total) |
| Volume | Throughput [Write] | bytes/s | 1m | Throughput (write) |
| Volume | Volume Total | bytes | 1m | Total byte count |
| Volume | Volume Usage | % | 1m | Usage rate |
| Volume | Volume Used | bytes | 1m | Usage |
Object Storage
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Request | Requests [Delete] | cnt | 1m | Number of HTTP DELETE requests executed on objects in the bucket |
| Request | Requests [Download Avg] | bytes | 1m | Download usage per bucket |
| Request | Requests [Get] | cnt | 1m | Number of HTTP GET requests executed on objects in the bucket |
| Request | Requests [Head] | cnt | 1m | Number of HTTP HEAD requests executed on objects in the bucket |
| Request | Requests [List] | cnt | 1m | Number of LIST requests executed for objects in the bucket |
| Request | Requests [Post] | cnt | 1m | Number of HTTP POST requests executed on objects in the bucket |
| Request | Requests [Put] | cnt | 1m | Number of HTTP PUT requests executed on objects in the bucket |
| Request | Requests [Total] | cnt | 1m | Total number of HTTP requests executed on the bucket |
| Request | Requests [Upload Avg] | bytes | 1m | Upload usage per bucket |
| Usage | Bucket Used | bytes | 1m | Amount of data stored in the bucket (bytes) |
| Usage | Objects | cnt | 1m | Number of objects stored in the bucket |
Block Storage(BM)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| State | Instance State | state | 1m | Blockstorage volume status |
| Volume | IOPS [Total] | iops | 1m | iops(total) |
| Volume | IOPS [Read] | iops | 1m | iops(read) |
| Volume | IOPS [Write] | iops | 1m | iops(write) |
| Volume | IOPS [Other] | iops | 1m | iops (other) |
| Volume | Latency Time [Total] | usec | 1m | Total latency |
| Volume | Latency Time [Read] | usec | 1m | Read latency |
| Volume | Latency Time [Write] | usec | 1m | Write latency |
| Volume | Latency Time [Other] | usec | 1m | Latency (Other) |
| Volume | Throughput [Total] | MB/s | 1m | Throughput (total) |
| Volume | Throughput [Read] | MB/s | 1m | Throughput (read) |
| Volume | Throughput [Write] | MB/s | 1m | Throughput (write) |
| Volume | Throughput [Other] | MB/s | 1m | Throughput (Other) |
| Volume | Volume Bytes | bytes | 1m | Total byte count |
Block Storage(VM)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| State | Instance State | state | 1m | Blockstorage volume status |
| Volume | IOPS [Read] | iops | 1m | iops(read) |
| Volume | IOPS [Write] | iops | 1m | iops(write) |
| Volume | Latency Time [Read] | usec | 1m | Read latency |
| Volume | Latency Time [Write] | usec | 1m | Write latency |
| Volume | Throughput [Read] | MB/s | 1m | Throughput (read) |
| Volume | Throughput [Write] | MB/s | 1m | Throughput (write) |
| Volume | Volume Bytes | bytes | 1m | Total byte count |
Database type
PostgreSQL(DBaaS)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Activelock | Active Locks | cnt | 1m | Number of activelocks |
| Activelock | Active Locks [Access Exclusive] | cnt | 1m | accessexclusive lock count |
| Activelock | Active Locks [Access Share] | cnt | 1m | Number of accessshare locks |
| Activelock | Active Locks [Total] | cnt | 1m | - |
| Activelock | Exclusive Locks | cnt | 1m | exclusive lock count |
| Activelock | Row Exclusive Locks | cnt | 1m | row exclusive lock count |
| Activelock | Row Share Locks | cnt | 1m | row share lock count |
| Activelock | Share Locks | cnt | 1m | share lock count |
| Activelock | Share Row Exclusive Locks | cnt | 1m | Number of sharerowexclusive locks |
| Activelock | Share Update Exclusive Locks | cnt | 1m | Number of share update exclusive locks |
| ActiveSession | Active Sessions | cnt | 1m | Number of active sessions |
| ActiveSession | Active Sessions [Total] | cnt | 1m | - |
| ActiveSession | Idle In Transaction Sessions | cnt | 1m | Number of sessions in idle_in_transaction state |
| ActiveSession | Idle In Transaction Sessions [Total] | cnt | 1m | - |
| ActiveSession | Idle Sessions | cnt | 1m | Number of idle sessions |
| ActiveSession | Idle Sessions [Total] | cnt | 1m | - |
| ActiveSession | Waiting Sessions | cnt | 1m | Number of sessions in waiting state |
| ActiveSession | Waiting Sessions [Total] | cnt | 1m | - |
| Connection | Connection Usage | % | 1m | - |
| Connection | Connection Usage [Total] | % | 1m | DB connection usage rate (%) |
| DB Age | DB Age Max | age | 1m | database age (frozen XID) value |
| Lock | Wait Locks | cnt | 1m | Number of lock-waiting sessions (by DB) |
| Lock | Wait Locks [Long Total] | cnt | 1m | Number of sessions with long (300 seconds) lock waiting |
| Lock | Wait Locks [Long] | cnt | 1m | - |
| Lock | Wait Locks [Total] | cnt | 1m | Number of sessions waiting due to lock occurrence |
| Long Transaction | Transaction Time Max [Long] | sec | 1m | - |
| Long Transaction | Transaction Time Max Total [Long] | sec | 1m | Long-running transaction time (minutes) |
| Replica | Apply Lag Time | sec | 1m | apply_lag time |
| Replica | Check No Replication | cnt | 1m | check_no_replication value |
| Replica | Check Replication | state | 1m | check_replication_state value |
| Slowquery | Slowqueries | cnt | 1m | Number of SQL queries running for a long time (over 5 minutes) |
| State | Instance State [PID] | PID | 1m | postgres process pid |
| Tablespace | Tablespace Used | bytes | 1m | Tablespace size |
| Tablespace | Tablespace Used [Total] | bytes | 1m | - |
| Tablespace | Tablespace Used Bytes [MB] | bytes | 1m | filesystem directory usage (MB) |
| Tablespace | Tablespaces [Total] | cnt | 1m | - |
MariaDB(DBaaS)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Activelock | Active Locks | cnt | 1m | Number of activelocks |
| Activesssion | Active Sessions | cnt | 1m | Number of activesession |
| Activesssion | Connection Usage [Total] | % | 1m | DB connection session usage rate |
| Activesssion | Connections | cnt | 1m | number of connections |
| Activesssion | Connections [MAX] | cnt | 1m | max connected threads count |
| Datafile | Binary Log Used [MB] | bytes | 1m | binary log usage (MB) |
| Datafile | Data Directory Used [MB] | bytes | 1m | datadir usage (MB) |
| Datafile | Open Files | cnt | 1m | Number of DB files in open state |
| Datafile | Open Files [MAX] | cnt | 1m | Number of DB files that can be opened |
| Datafile | Open Files Usage | % | 1m | DB file maximum count utilization |
| Datafile | Relay Log Used [MB] | bytes | 1m | Relay log usage (MB) |
| State | Instance State [PID] | PID | 1m | mariadbd process pid mysqld process pid(pre‑v10.5.2 version) |
| State | Safe PID | PID | 1m | mariadbd_safe process pid mysqld_safe process pid (prior to v10.5.2) |
| State | Slave Behind Master seconds | sec | 1m | Time difference of Data between Master and Slave (run only on slave) |
| Tablespace | Tablespace Used | bytes | 1m | Tablespace usage |
| Tablespace | Tablespace Used [Total] | bytes | 1m | - |
| Transaction | Running Threads | cnt | 1m | running thread count |
| Transaction | Slowqueries | cnt | 1m | Number of long-running SQL queries (over 5 minutes) (by DB) |
| Transaction | Slowqueries [Total] | cnt | 1m | Number of SQL queries running for a long time (over 5 minutes) (total) |
| Transaction | Transaction Time [Long] | sec | 1m | Transaction maximum execution time (seconds) |
| Transaction | Wait Locks | cnt | 1m | Number of sessions blocked for more than 60 seconds by lock |
MySQL(DBaaS)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Activelock | Active Locks | cnt | 1m | Number of activelocks |
| Activesssion | Active Sessions | cnt | 1m | connected threads count |
| Activesssion | Connection Usage [Total] | % | 1m | DB connection session usage rate |
| Activesssion | Connections | cnt | 1m | number of connections |
| Activesssion | Connections [MAX] | cnt | 1m | max connected threads count |
| Datafile | Binary Log Used [MB] | bytes | 1m | binary log usage (MB) |
| Datafile | Data Directory Used [MB] | bytes | 1m | datadir usage (MB) |
| Datafile | Open Files | cnt | 1m | Number of DB files in open state |
| Datafile | Open Files [MAX] | cnt | 1m | Number of DB files that can be opened |
| Datafile | Open Files Usage | % | 1m | DB file maximum count utilization |
| Datafile | Relay Log Used [MB] | bytes | 1m | Relay log usage (MB) |
| State | Instance State [PID] | PID | 1m | mysqld process pid |
| State | Safe PID | PID | 1m | safe program PID |
| State | Slave Behind Master seconds | sec | 1m | Time difference with master node (sec) |
| Tablespace | Tablespace Used | bytes | 1m | Tablespace usage |
| Tablespace | Tablespace Used [Total] | bytes | 1m | Tablespace usage (total) |
| Transaction | Running Threads | cnt | 1m | running thread count |
| Transaction | Slowqueries | cnt | 1m | Number of long-running SQL queries (over 5 minutes) (by DB) |
| Transaction | Slowqueries [Total] | cnt | 1m | Number of SQL queries running for a long time (over 5 minutes) (total) |
| Transaction | Transaction Time [Long] | sec | 1m | Transaction maximum execution time (seconds) |
| Transaction | Wait Locks | cnt | 1m | Number of sessions blocked for more than 60 seconds by lock |
Microsoft SQL Server(DBaaS)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Activelock | Active Locks | cnt | 1m | Number of activelocks |
| Activesssion | Active Sessions | cnt | 1m | Number of activesession |
| Activetransaction | Active Transactions [Total] | cnt | 1m | Number of active transactions |
| Connection | Connected Users | cnt | 1m | Number of users connected to the system |
| Datafile | Datavolume Size [Free] | bytes | 1m | available space |
| Datafile | DBFiles [Not Online] | cnt | 1m | Run a query to verify that all data files are in the ONLINE state. |
| Datafile | Tablespace Used | bytes | 1m | Data volume size |
| Lock | Lock Processes [Blocked] | cnt | 1m | Number of SQL processes blocked by other processes |
| Lock | Lock Waits [Per Second] | cnt | 1m | Lock wait count per second |
| Slowquery | Blocking Session ID | ID | 1m | Number of SQL queries running for a long time (over 5 minutes) |
| Slowquery | Slowqueries | cnt | 1m | Number of SQL queries running for a long time (over 5 minutes) |
| Slowquery | Slowquery CPU Time | ms | 1m | CPU time consumed by SQL execution that runs for a long time (over 5 minutes) |
| Slowquery | Slowquery Execute Context ID | ID | 1m | Context ID associated with the execution task of a SQL that runs for a long time (5 minutes or more) |
| Slowquery | Slowquery Memory Usage | bytes | 1m | Memory usage consumed by the execution of SQL that runs for a long time (over 5 minutes) |
| Slowquery | Slowquery Session ID | ID | 1m | Session ID of SQL queries running for a long time (over 5 minutes) |
| Slowquery | Slowquery Wait Duration Time | ms | 1m | Total wait time for wait type |
| State | Instance State [Cluster] | state | 1m | Status during MSSQL cluster configuration |
| State | Instance State [PID] | PID | 1m | sqlservr.exe process pid |
| State | Page IO Latch Wait Time | ms | 1m | Page IO latch waits average wait time |
| Transaction | Transaction Time [MAX] | cnt | 1m | Long-running (5 minutes or more) transaction |
EPAS(DBaaS)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Activelock | Access Exclusive Locks | cnt | 1m | accessexclusive lock count |
| Activelock | Access Share Locks | cnt | 1m | Number of accessshare locks |
| Activelock | Active Locks | cnt | 1m | Number of activelocks |
| Activelock | Active Locks [Total] | cnt | 1m | activelock count (total) |
| Activelock | Exclusive Locks | cnt | 1m | exclusive lock count |
| Activelock | Row Exclusive Locks | cnt | 1m | row exclusive lock count |
| Activelock | Row Share Locks | cnt | 1m | row share lock count |
| Activelock | Share Locks | cnt | 1m | share lock count |
| Activelock | Share Row Exclusive Locks | cnt | 1m | Number of share row exclusive locks |
| Activelock | Share Update Exclusive Locks | cnt | 1m | Number of share update exclusive locks |
| Activesession | Active Sessions | cnt | 1m | Number of active sessions |
| Activesession | Active Sessions [Total] | cnt | 1m | Total number of active sessions |
| Activesession | Idel In Transaction Sessions | cnt | 1m | Number of sessions in idle_in_transaction state |
| Activesession | Idle In Transaction Sessions [Total] | cnt | 1m | Total number of sessions in idle_in_transaction state |
| Activesession | Idle Sessions | cnt | 1m | Number of idle sessions |
| Activesession | Idle Sessions [Total] | cnt | 1m | Total number of idle sessions |
| Activesession | Waiting Sessions | cnt | 1m | Number of sessions in waiting state |
| Activesession | Waiting Sessions [Total] | cnt | 1m | Total number of sessions in waiting state |
| Connection | Connection Usage | % | 1m | DB connection usage rate (%) |
| Connection | Connection Usage [Total] | % | 1m | Overall DB connection usage (%) |
| Connection | Connection Usage Per DB | % | 1m | DB connection usage rate (%) by DB |
| DB Age | DB Age Max | age | 1m | database age (frozen XID) value |
| Lock | Wait Locks | cnt | 1m | Number of sessions with long (300 seconds) lock waiting |
| Lock | Wait Locks [Long Total] | cnt | 1m | Total number of lock-waiting sessions (300 seconds) |
| Lock | Wait Locks [Long] | cnt | 1m | Number of sessions waiting due to lock occurrence |
| Lock | Wait Locks [Total] | cnt | 1m | Total number of sessions waiting due to lock occurrence |
| Lock | Wait Locks Per DB [Total] | cnt | 1m | Total number of sessions waiting due to lock occurrences per DB |
| Long Transaction | Transaction Time Max [Long] | sec | 1m | Long-running transaction time (minutes) |
| Long Transaction | Transaction Time Max Total [Long] | sec | 1m | Long-running transaction time (minutes) |
| Replica | Apply Lag Time | sec | 1m | apply_lag time |
| Replica | Check No Replication | cnt | 1m | check_no_replication value |
| Replica | Check Replication | state | 1m | check_replication_state value |
| Slowquery | Slowqueries | cnt | 1m | Number of SQL queries running for a long time (over 5 minutes) |
| State | Instance state [PID] | PID | 1m | edb-postgres process pid |
| Tablespace | Tablespace Used Bytes [MB] | bytes | 1m | filesystem directory usage (MB) |
| Tablespace | Tablespaces [Total] | cnt | 1m | Total Tablespace size |
| Tablespace | Tablespace Used | bytes | 1m | Size of the tablespace in use |
| Tablespace | Tablespace Used [Total] | bytes | 1m | Total size of the Tablespace in use |
CacheStore(DBaaS)
Redis
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Stats | Active Defragmentation Keys [Hits] | cnt | 1m | Number of keys after defragmentation |
| Stats | Active Defragmentation Keys [Miss] | cnt | 1m | Number of keys skipped in the active defragmentation removal process |
| Stats | Active Defragmentationd [Hits] | cnt | 1m | Number of value reassignments performed by the active defragmentation removal process |
| Stats | Active Defragmentations [Miss] | cnt | 1m | Number of value reallocations that were stopped, starting with the active defragmentation removal process |
| Memory | Allocated Bytes [OS] | bytes | 1m | Number of bytes allocated by Redis and recognized by the operating system (resident set size) |
| Memory | Allocated Bytes [Redis] | bytes | 1m | Total bytes allocated by Redis |
| Persistence | AOF Buffer Size | bytes | 1m | AOF buffer size |
| Persistence | AOF File Size [Current] | bytes | 1m | AOF current file size |
| Persistence | AOF File Size [Lastest Startup] | bytes | 1m | AOF file size on recent start or rewrite |
| Persistence | AOF Rewrite Buffer Size | bytes | 1m | AOF rewrite buffer size |
| Persistence | AOF Rewrite Current Time | sec | 1m | If applicable, the time of the ongoing AOF rewrite operation |
| Persistence | AOF Rewrite Last Time | sec | 1m | Final AOF rewrite operation time (seconds) |
| Commandstats | Calls | cnt | 1m | Number of calls that reached command execution (not rejected) |
| Commandstats | Calls [Failed] | cnt | 1m | Number of failed calls |
| Commandstats | Calls [Rejected] | cnt | 1m | Number of rejected calls |
| Persistence | Changes [Last Saved] | cnt | 1m | Number of changes after the final dump |
| Clients | Client Output Buffer [MAX] | cnt | 1m | Current longest output list for client connections |
| Clients | Client Input Buffer [MAX] | cnt | 1m | Maximum input buffer for the current client connection |
| Sentinel | Clients [Sentinel] | cnt | 1m | Number of client connections (sentinel) |
| Replication | Connected Slaves | cnt | 1m | Number of connected slaves |
| Clients | Connections [Blocked] | cnt | 1m | Number of clients pending blocking calls (BLPOP, BRPOP, BRPOPLPUSH) |
| Clients | Connections [Current] | cnt | 1m | Number of client connections (excluding slave connections) |
| Persistence | Copy On Write Allocated Size [AOF] | bytes | 1m | COW allocation size during final RBD save operation |
| Persistence | Copy On Write Allocated Size [RDB] | bytes | 1m | COW allocation size during final RBD save operation |
| Commandstats | CPU Time [Average] | cnt | 1m | Average CPU used per command execution |
| Commandstats | CPU Time [Total] | usec | 1m | Total CPU time used by these commands |
| CPU | CPU Usage [System Process] | % | 1m | System CPU used by background processes |
| CPU | CPU Usage [System] | % | 1m | System CPU used by the Redis server |
| CPU | CPU Usage [User Process] | % | 1m | User CPU used by background processes |
| CPU | CPU Usage [User] | % | 1m | System CPU used by background processes |
| Memory | Dataset Used | bytes | 1m | Dataset size |
| Disk | Disk Used | bytes | 1m | datadir usage |
| Stats | Evicted Keys | cnt | 1m | Number of evicted keys caused by the maxmemory limit |
| Persistence | Fsyncs [Delayed] | cnt | 1m | Delayed fsync counter |
| Persistence | Fsyncs [Pending] | cnt | 1m | Number of pending fsync operations in the background I/O queue (format: bytes) |
| Stats | Full Resyncs | cnt | 1m | Number of full resynchronizations with the slave |
| Stats | Keys [Expired] | cnt | 1m | Total number of key expiration events |
| Keyspace | Keys [Keyspace] | cnt | 1m | Number of keys in the key space |
| Stats | Lastest Fork Duration Time | usec | 1m | Recent fork operation time (microseconds) |
| Stats | Lookup Keys [Hit] | cnt | 1m | Number of successful key lookups in the main dictionary |
| Stats | Lookup Keys [Miss] | cnt | 1m | Number of failed key lookups in the main dictionary |
| Memory | Lua Engine Memory Used | bytes | 1m | Memory used by the Lua engine |
| Replication | Master Last Interaction Time Ago | sec | 1m | Elapsed time (seconds) since the final interaction with the master |
| Replication | Master Last Interaction Time Ago [Sync] | sec | 1m | Elapsed time (seconds) since the final interaction with the master |
| Replication | Master Offset | pid | 1m | Current replication offset of the server |
| Replication | Master Second Offset | pid | 1m | Offset until the replica ID is accepted |
| Replication | Master Sync Left Bytes | bytes | 1m | Remaining bytes before synchronization completes |
| Memory | Memory Fragmentation Rate | % | 1m | used_memory_rss and used_memory ratio |
| Memory | Memory Fragmentation Rate [Allocator] | % | 1m | fragmentation ratio |
| Memory | Memory Fragmentation Used | bytes | 1m | Bytes between used_memory_rss and used_memory |
| Memory | Memory Fragmentation Used [Allocator] | bytes | 1m | resident byte |
| Memory | Memory Max Value | bytes | 1m | Memory limit |
| Memory | Memory Resident [Allocator] | bytes | 1m | resident memory |
| Memory | Memory RSS Rate [Allocator] | % | 1m | resident ratio |
| Memory | Memory Used [Active] | bytes | 1m | Active memory |
| Memory | Memory Used [Allocated] | bytes | 1m | Allocated memory |
| Memory | Memory Used [Resident] | bytes | 1m | resident byte |
| Stats | Network In Bytes [Total] | bytes | 1m | Total network input |
| Stats | Network Out Bytes [Total] | bytes | 1m | Total network output |
| Stats | Network Read Rate | cnt | 1m | Network read speed (KB/sec) |
| Stats | Network Write Rate | cnt | 1m | Network write speed (KB/sec) |
| Stats | Partial Resync Requests [Accepted] | cnt | 1m | Number of accepted partial resynchronization requests |
| Stats | Partial Resync Requests [Denied] | cnt | 1m | Number of re-sync requests for rejected parts |
| Memory | Peak Memory Consumed | bytes | 1m | Maximum memory used by Redis |
| Stats | Processed Commands | cnt | 1m | Number of commands processed per second |
| Stats | Processed Commands [Total] | cnt | 1m | Total number of processed commands |
| Stats | Pub/Sub Channels | cnt | 1m | Global count of pub/sub channels with client subscriptions |
| Stats | Pub/Sub Patterns | cnt | 1m | Global count of publish/subscribe pattern with client subscriptions |
| Persistence | RDB Saved Duration Time [Current] | sec | 1m | If applicable, the time of the ongoing RDB save operation |
| Persistence | RDB Saved Duration Time [Last] | sec | 1m | Final RDB save operation time (seconds) |
| Stats | Received Connections [Total] | cnt | 1m | Total number of received connections |
| Stats | Rejected Connections [Total] | cnt | 1m | Total number of rejected connections |
| Replication | Replication Backlog Actove Count | cnt | 1m | Replication backlog enable flag |
| Replication | Replication Backlog Master Offset | cnt | 1m | Master offset of the replication backlog buffer |
| Replication | Replication Backlog Size | bytes | 1m | Data size of the replication backlog buffer (bytes) |
| Replication | Replication Backlog Size [Total] | bytes | 1m | Total size of the replication backlog buffer (bytes) |
| Replication | Slave Priority | cnt | 1m | Priority of instances as a fault handling target |
| Replication | Slave Replication Offset | pid | 1m | Replication offset of the slave instance |
| Slowlog | Slow Operations | cnt | 1m | Number of slow tasks |
| Stats | Sockets [MIGRATE] | cnt | 1m | Number of sockets opened for migration |
| Stats | Tracked Keys [Expiry] | cnt | 1m | Number of keys tracked for expiration (applicable only to writable slaves) |
| State | Instance Status [PID] | PID | 1m | redis-server process pid |
| State | Sentinel Status [PID] | PID | 1m | sentinel process pid |
Valkey
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Stats | Active Defragmentation Keys [Hits] | cnt | 1m | Number of keys after defragmentation |
| Stats | Active Defragmentation Keys [Miss] | cnt | 1m | Number of keys skipped in the active defragmentation removal process |
| Stats | Active Defragmentationd [Hits] | cnt | 1m | Number of value reassignments performed by the active defragmentation removal process |
| Stats | Active Defragmentations [Miss] | cnt | 1m | Number of value reallocations that were stopped, starting with the active defragmentation removal process |
| Memory | Allocated Bytes [OS] | bytes | 1m | Number of bytes allocated by Valkey and recognized by the operating system (resident set size) |
| Memory | Allocated Bytes [Valkey] | bytes | 1m | Total bytes allocated by Valkey |
| Persistence | AOF Buffer Size | bytes | 1m | AOF buffer size |
| Persistence | AOF File Size [Current] | bytes | 1m | AOF current file size |
| Persistence | AOF File Size [Lastest Startup] | bytes | 1m | AOF file size on recent start or rewrite |
| Persistence | AOF Rewrite Buffer Size | bytes | 1m | AOF rewrite buffer size |
| Persistence | AOF Rewrite Current Time | sec | 1m | If applicable, the time of the ongoing AOF rewrite operation |
| Persistence | AOF Rewrite Last Time | sec | 1m | Final AOF rewrite operation time (seconds) |
| Commandstats | Calls | cnt | 1m | Number of calls that reached command execution (not rejected) |
| Commandstats | Calls [Failed] | cnt | 1m | Number of failed calls (Valkey 6.2-rc2) |
| Commandstats | Calls [Rejected] | cnt | 1m | Rejected call count (Valkey 6.2-rc2) |
| Persistence | Changes [Last Saved] | cnt | 1m | Number of changes after the final dump |
| Clients | Cleint Output Buffer [MAX] | cnt | 1m | Current longest output list for client connections |
| Clients | Client Input Buffer [MAX] | cnt | 1m | Maximum input buffer for current client connections (Valkey 5.0) |
| Sentinel | Clients [Sentinel] | cnt | 1m | Number of client connections (sentinel) |
| Replication | Connected Slaves | cnt | 1m | Number of connected slaves |
| Clients | Connections [Blocked] | cnt | 1m | Number of clients pending blocking calls (BLPOP, BRPOP, BRPOPLPUSH) |
| Clients | Connections [Current] | cnt | 1m | Number of client connections (excluding slave connections) |
| Persistence | Copy On Write Allocated Size [AOF] | bytes | 1m | COW allocation size during final RBD save operation |
| Persistence | Copy On Write Allocated Size [RDB] | bytes | 1m | COW allocation size during final RBD save operation |
| Commandstats | CPU Time [Average] | cnt | 1m | Average CPU used per command execution |
| Commandstats | CPU Time [Total] | usec | 1m | Total CPU time used by these commands |
| CPU | CPU Usage [System Process] | % | 1m | System CPU used by background processes |
| CPU | CPU Usage [System] | % | 1m | System CPU used by the Valkey server |
| CPU | CPU Usage [User Process] | % | 1m | User CPU used by background processes |
| CPU | CPU Usage [User] | % | 1m | System CPU used by background processes |
| Memory | Dataset Used | bytes | 1m | Dataset size |
| Disk | Disk Used | MB | 1m | datadir usage |
| Stats | Evicted Keys | cnt | 1m | Number of evicted keys caused by the maxmemory limit |
| Persistence | Fsyncs [Delayed] | cnt | 1m | Delayed fsync counter |
| Persistence | Fsyncs [Pending] | cnt | 1m | Number of pending fsync operations in the background I/O queue (format: bytes) |
| Stats | Full Resyncs | cnt | 1m | Number of full resynchronizations with the slave |
| Stats | Keys [Expired] | cnt | 1m | Total number of key expiration events |
| Keyspace | Keys [Keyspace] | cnt | 1m | Number of keys in the key space |
| Stats | Lastest Fork Duration Time | usec | 1m | Recent fork operation time (microseconds) |
| Stats | Lookup Keys [Hit] | cnt | 1m | Number of successful key lookups in the main dictionary |
| Stats | Lookup Keys [Miss] | cnt | 1m | Number of failed key lookups in the main dictionary |
| Memory | Lua Engine Memory Used | bytes | 1m | Memory used by the Lua engine |
| Replication | Master Last Interaction Time Ago | sec | 1m | Elapsed time (seconds) since the final interaction with the master |
| Replication | Master Last Interaction Time Ago [Sync] | sec | 1m | Elapsed time (seconds) since the final interaction with the master |
| Replication | Master Offset | pid | 1m | Current replication offset of the server |
| Replication | Master Second Offset | pid | 1m | Offset until the replica ID is accepted |
| Replication | Master Sync Left Bytes | bytes | 1m | Remaining bytes before synchronization completes |
| Memory | Memory Fragmentation Rate | % | 1m | used_memory_rss and used_memory ratio |
| Memory | Memory Fragmentation Rate [Allocator] | % | 1m | fragmentation ratio |
| Memory | Memory Fragmentation Used | bytes | 1m | Bytes between used_memory_rss and used_memory |
| Memory | Memory Fragmentation Used [Allocator] | bytes | 1m | resident byte |
| Memory | Memory Max Value | bytes | 1m | Memory limit |
| Memory | Memory Resident [Allocator] | bytes | 1m | resident memory |
| Memory | Memory RSS Rate [Allocator] | % | 1m | resident ratio |
| Memory | Memory Used [Active] | bytes | 1m | Active memory |
| Memory | Memory Used [Allocated] | bytes | 1m | Allocated memory |
| Memory | Memory Used [Resident] | bytes | 1m | resident byte |
| Stats | Network In Bytes [Total] | bytes | 1m | Total network input |
| Stats | Network Out Bytes [Total] | bytes | 1m | Total network output |
| Stats | Network Read Rate | kbps | 1m | Network read speed (KB/sec) |
| Stats | Network Write Rate | kbps | 1m | Network write speed (KB/sec) |
| Stats | Partial Resync Requests [Accepted] | cnt | 1m | Number of accepted partial resynchronization requests |
| Stats | Partial Resync Requests [Denied] | cnt | 1m | Number of re-sync requests for rejected parts |
| Memory | Peak Memory Consumed | bytes | 1m | Maximum memory used by Valkey |
| Stats | Processed Commands | cnt | 1m | Number of commands processed per second |
| Stats | Processed Commands [Total] | cnt | 1m | Total number of processed commands |
| Stats | Pub/Sub Channels | cnt | 1m | Global count of pub/sub channels with client subscriptions |
| Stats | Pub/Sub Patterns | cnt | 1m | Global count of publish/subscribe pattern with client subscriptions |
| Persistence | RDB Saved Duration Time [Current] | sec | 1m | If applicable, the time of the ongoing RDB save operation |
| Persistence | RDB Saved Duration Time [Last] | sec | 1m | Final RDB save operation time (seconds) |
| Stats | Received Connections [Total] | cnt | 1m | Total number of received connections |
| Stats | Rejected Connections [Total] | cnt | 1m | Total number of rejected connections |
| Replication | Replication Backlog Active Count | cnt | 1m | Replication backlog enable flag |
| Replication | Replication Backlog Master Offset | cnt | 1m | Master offset of the replication backlog buffer |
| Replication | Replication Backlog Size | bytes | 1m | Data size of the replication backlog buffer |
| Replication | Replication Backlog Size [Total] | bytes | 1m | Total size of the replication backlog buffer |
| Replication | Slave Priority | cnt | 1m | Priority of instances as a fault handling target |
| Replication | Slave Replication Offset | pid | 1m | Replication offset of the slave instance |
| Slowlog | Slow Operations | cnt | 1m | Number of slow tasks |
| Stats | Sockets [MIGRATE] | cnt | 1m | Number of sockets opened for migration |
| Stats | Tracked Keys [Expiry] | cnt | 1m | Number of keys tracked for expiration (applicable only to writable slaves) |
| State | Instance State [PID] | PID | 1m | Valkey-server process PID |
| State | Sentinel State [PID] | PID | 1m | Sentinel process PID |
Data Analytics type
Event Streams
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Broker | Connections [Zookeeper Client] | cnt | 1m | Number of ZooKeeper connections |
| Broker | Failed [Client Fetch Request] | cnt | 1m | Client fetch request processing failure count |
| Broker | Failed [Produce Request] | cnt | 1m | Procucer request processing failure count |
| Broker | Incomming Messages | cnt | 1m | Number of messages received by the broker |
| Broker | Leader Elections | cnt | 1m | Leader Election occurrence count |
| Broker | Leader Elections [Unclean] | cnt | 1m | Number of Unclean Leader Election occurrences |
| Broker | Log Flushes | cnt | 1m | Number of log flush occurrences |
| Broker | Network In Bytes | bytes | 1m | Total bytes received by the Topic |
| Broker | Network Out Bytes | bytes | 1m | Total bytes transmitted by the Topic |
| Broker | Rejected Bytes | bytes | 1m | Total bytes rejected by the Topic |
| Broker | Request Queue Length | cnt | 1m | Request queue size |
| Broker | Zookeeper Sessions [Closed] | cnt | 1m | ZooKeeper closed sessions per second |
| Broker | Zookeeper Sessions [Expired] | cnt | 1m | ZooKeeper expired sessions per second |
| Broker | Zookeeper Sessions [Readonly] | cnt | 1m | ZooKeeper read‑only sessions per second |
| Broker | Incomming Messages Rate [Topic] | cnt | 1m | Number of received messages per topic |
| Broker | Incomming Byte Rate [Second] | bytes | 1m | per second Incomming data |
| Broker | Outgoing Byte Rate [Second] | bytes | 1m | Outgoing data per second |
| Broker | Rejected Byte Rate [Second] | bytes | 1m | Bytes rejected per second |
| Disk | Disk Used | bytes | 1m | Datadir usage |
| State | AKHQ State [PID] | PID | 1m | akhq process pid |
| State | Instance State [PID] | PID | 1m | kafka process pid |
| State | Zookeeper State [PID] | PID | 1m | zookeeper process pid |
Search Engine
Elasticsearch
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Cluster | Shards | cnt | 1m | Number of cluster shards |
| Cluster | Shards [Primary] | cnt | 1m | Number of primary shards in the cluster |
| Cluster | Index [Total] | cnt | 1m | Number of clustered indexes |
| Cluster | License Expiry Date [ms] | ms | 1m | License expiration date [milisecond] |
| Cluster | License Status | state | 1m | License status |
| Cluster | License Type | type | 1m | License type |
| FileSystem | Disk Usage | bytes | 1m | datadir usage |
| Node | Documents [Deleted] | cnt | 1m | Total number of deleted documents |
| Node | Documents [Existing] | cnt | 1m | Total number of existing documents |
| Node | Filesystem Bytes [Available] | bytes | 1m | Available file systems |
| Node | Filesystem Bytes [Free] | bytes | 1m | Available file system |
| Node | Filesystem Bytes [Total] | bytes | 1m | Total file system |
| Node | JVM Heap Used [Init] | bytes | 1m | Heap init used by JVM (bytes) |
| Node | JVM Heap Used [MAX] | bytes | 1m | Heap max used by JVM (bytes) |
| Node | JVM Non Heap Used [Init] | bytes | 1m | init(bytes) excluding the heap used by the JVM |
| Node | JVM Non Heap Used [MAX] | bytes | 1m | max (bytes) excluding the heap used by the JVM |
| Node | Segments | cnt | 1m | Total number of segments |
| Node | Segments Bytes | bytes | 1m | Total size of the segment |
| Node | Store Bytes | bytes | 1m | Total size of the repository |
| State | Instance state [PID] | PID | 1m | Elasticsearch process pid |
| Task | Queue Time | ms | 1m | Queue time |
| Kibana | Kibana state [PID] | PID | 1m | Kibana process pid |
| Kibana | Kibana Connections | cnt | 1m | connection |
| Kibana | Kibana Memory Heap Allocated [Limit] | bytes | 1m | Maximum old space size allocated to the Node.js process |
| Kibana | Kibana Memory Heap Allocated [Total] | bytes | 1m | Memory |
| Kibana | Kibana Memory Heap Used | bytes | 1m | Memory |
| Kibana | Kibana Process Uptime | ms | 1m | Process |
| Kibana | Kibana Requests [Disconnected] | cnt | 1m | Request count metric |
| Kibana | Kibana Requests [Total] | cnt | 1m | Request count metric |
| Kibana | Kibana Response Time [Avg] | ms | 1m | Response time metric |
| Kibana | Kibana Response Time [MAX] | ms | 1m | Response time metric |
Opensearch
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| State | Cluster state | state | 1m | Cluster status |
| Cluster | Nodes | cnt | 1m | Number of nodes in the cluster |
| Cluster | Data nodes | cnt | 1m | Number of data nodes in the cluster |
| Cluster | Pending tasks | cnt | 1m | Number of pending tasks |
| Shard | Shards [active] | cnt | 1m | Active piece count |
| Shard | Shards [active_primary] | cnt | 1m | Number of active primary fragments |
| Shard | Shards [initializing] | cnt | 1m | Initial shard count |
| Shard | Shards [relocating] | cnt | 1m | Previous piece count |
| Shard | Shards [unassigned] | cnt | 1m | Number of unallocated fragments |
| Thread | Thread Queue Count [search] | cnt | 1m | Number of search tasks in the queue |
| Thread | Thread Queue Count [refresh] | cnt | 1m | Number of refresh tasks in the queue |
| Thread | Thread Queue Count [write] | cnt | 1m | Number of write operations in the queue |
| Thread | Thread Queue Count [get] | cnt | 1m | Number of jobs fetched from the queue |
| Thread | Thread Queue Count [snapshot] | cnt | 1m | Number of snapshot jobs in the queue |
| Thread | Thread Queue Count [flush] | cnt | 1m | Number of flush operations in the queue |
| Thread | Thread Queue Count [force_merge] | cnt | 1m | Number of force_merge tasks in the queue |
| System | CPU usage | % | 1m | CPU usage |
| System | Memory usage | bytes | 1m | Used memory |
| System | Disk available | bytes | 1m | Disk Available |
| Documents | Documents indexing rate | cnt | 1m | Number of indexed documents |
| Documents | Documents indexing rate [Delta] | cnt | 1m | Number of indexed documents (delta value) |
| Documents | Indexing latency | sec | 1m | Time taken to index documents |
| Documents | Indexing latency [Delta] | sec | 1m | Time taken to index the document (delta value) |
| Documents | Search rate | cnt | 1m | Number of search queries |
| Documents | Search rate [Delta] | cnt | 1m | Number of search queries (delta value) |
| Documents | Search latency | sec | 1m | Time taken during the query |
| Documents | Search latency [Delta] | sec | 1m | Time taken during the query (delta value) |
| Documents | Document count (with replicas) | cnt | 1m | Total number of documents |
| Documents | Document deleting rate | cnt | 1m | Number of deleted documents |
| Documents | Document deleting rate [Delta] | cnt | 1m | Number of deleted documents (delta value) |
| Documents | Document merging rate | cnt | 1m | Number of merged documents |
| Documents | Document merging rate [Delta] | cnt | 1m | Number of merged documents (delta value) |
| JVM | Heap used | bytes | 1m | Memory used in the heap |
| JVM | GC count [young] | cnt | 1m | Number of young GC collections |
| JVM | GC count [young] [Delta] | cnt | 1m | Young GC collection count (delta value) |
| JVM | GC count [G1] | cnt | 1m | G1 GC collection count |
| JVM | GC count [G1] [Delta] | cnt | 1m | G1 GC collection count (delta value) |
| JVM | GC count [old] | cnt | 1m | Number of previous GC collections |
| JVM | GC count [old] [Delta] | cnt | 1m | Previous GC collection count (delta value) |
| JVM | GC time [young] | cnt | 1m | Time spent on young GC collection |
| JVM | GC time [young] [Delta] | cnt | 1m | Time spent for young GC collection (delta value) |
| JVM | GC time [G1] | cnt | 1m | Time spent on G1 GC collection |
| JVM | GC time [G1] [Delta] | cnt | 1m | Time spent on G1 GC collection (delta value) |
| JVM | GC time [old] | cnt | 1m | Time spent on old GC collections |
| JVM | GC time [old] [Delta] | cnt | 1m | Time spent on old GC collections (delta value) |
| State | Instance state [PID] | PID | 1m | Opensearch process PID |
| State | Dashboard state [PID] | PID | 1m | Dashboard process PID |
Vertica(DBaaS)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| State | Instance State [PID] | state | 1m | Vertica process PID |
| Activelock | Active Locks | cnt | 1m | Active Locks count |
| Activesession | Active Sessions | cnt | 1m | Number of Active Sessions |
| Tablespace | Data Tablespace Used | MB | 1m | Data, Temp Tablespace usage |
| Tablespace | Catalog Tablespace Used | MB | 1m | Catalog Tablespace Usage |
Container type
Kubernetes Engine
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Cluster | Cluster Namespaces [Active] | cnt | 5m | Number of namespaces in active state |
| Cluster | Cluster Namespaces [Total] | cnt | 5m | Total number of namespaces in the cluster |
| Cluster | Cluster Nodes [Ready] | cnt | 5m | Number of nodes in READY state |
| Cluster | Cluster Nodes [Total] | cnt | 5m | Total number of nodes in the cluster |
| Cluster | Cluster Pods [Failed] | cnt | 5m | Number of failed-state pods in the cluster |
| Cluster | Cluster Pods [Pending] | cnt | 5m | Number of pending pods in the cluster |
| Cluster | Cluster Pods [Running] | cnt | 5m | Number of pods in running state within the cluster |
| Cluster | Cluster Pods [Succeeded] | cnt | 5m | Number of succeeded pods in the cluster |
| Cluster | Cluster Pods [Unknown] | cnt | 5m | Number of pods in unknown state within the cluster |
| Cluster | Instance State | state | 5m | cluster status |
| Namespace | Namespace Pods [Failed] | cnt | 5m | Number of failed-state pods in a namespace |
| Namespace | Namespace Pods [Pending] | cnt | 5m | Number of pending pods in the namespace |
| Namespace | Namespace Pods [Running] | cnt | 5m | Number of running pods in a namespace |
| Namespace | Namespace Pods [Succeeded] | cnt | 5m | Number of succeeded pods in the namespace |
| Namespace | Namespace Pods [Unknown] | cnt | 5m | Number of unknown-state pods in the namespace |
| Namespace | Namespace GPU Clock Frequency | MHz | 5m | SM clock frequency in the Namespace |
| Namespace | Namespace GPU Memory Usage | % | 5m | Memory utilization in Namespace |
| Node | Node CPU Size [Allocatable] | cnt | 5m | Node allocatable CPU |
| Node | Node CPU Size [Capacity] | cnt | 5m | CPU capacity within the node |
| Node | Node CPU Usage | % | 5m | CPU usage on the node |
| Node | Node CPU Usage [Request] | % | 5m | CPU request_ratio within node |
| Node | Node CPU Used | state | 5m | CPU utilization within the node |
| Node | Node Filesystem Usage | % | 5m | FS usage within node |
| Node | Node Memory Size [Allocatable] | bytes | 5m | memory allocatable within the node |
| Node | Node Memory Size [Capacity] | bytes | 5m | Node memory utilization |
| Node | Node Memory Usage | % | 5m | Node memory utilization |
| Node | Node Memory Usage [Request] | % | 5m | memory request_ratio within the node |
| Node | Node Memory Workingset | bytes | 5m | memory working set within the node |
| Node | Node Network In Bytes | bytes | 5m | Node network rx bytes |
| Node | Node Network Out Bytes | bytes | 5m | Node network tx bytes |
| Node | Node Network Total Bytes | bytes | 5m | Node network total bytes |
| Node | Node Pods [Failed] | cnt | 5m | Number of pods in failed state within a node |
| Node | Node Pods [Pending] | cnt | 5m | Number of pending pods in the node |
| Node | Node Pods [Running] | cnt | 5m | Number of running pods per node |
| Node | Node Pods [Succeeded] | cnt | 5m | Number of succeeded pods in the node |
| Node | Node Pods [Unknown] | cnt | 5m | Number of pods in unknown state on the node |
| Pod | Pod CPU Usage [Limit] | % | 5m | CPU usage_limit_ratio within the pod |
| Pod | Pod CPU Usage [Request] | % | 5m | CPU request_ratio within the pod |
| Pod | Pod CPU Usage | mc | 5m | CPU usage within the pod |
| Pod | Pod Memory Usage [Limit] | % | 5m | memory usage_limit_ratio in the pod |
| Pod | Pod Memory Usage [Request] | % | 5m | memory request_ratio in pod |
| Pod | Pod Memory Usage | bytes | 5m | Memory usage within the pod |
| Pod | Pod Network In Bytes | bytes | 5m | network rx bytes in pod |
| Pod | Pod Network Out Bytes | bytes | 5m | network tx bytes in pod |
| Pod | Pod Network Total Bytes | bytes | 5m | Network total bytes in pod |
| Pod | Pod Restart Containers | cnt | 5m | container restart count in pod |
| Workload | Workload Pods [Running] | cnt | 5m | - |
Container Registry
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Container Registry | Image Pulls [Denied] | cnt | 1m | Number of rejected Image Tag (digest) Pulls |
| Container Registry | Image Pushs [Allowed] | cnt | 1m | Allowed Image Tag (digest) Push count |
| Container Registry | Image Pushs [Denied] | cnt | 1m | Number of rejected Image Tag (digest) Pushes |
| Container Registry | Image Scans[Allowed] | cnt | 1m | Allowed Image Tag (digest) Scan count |
| Container Registry | Image Scans [Denied] | cnt | 1m | Number of rejected Image Tag (digest) scans |
| Container Registry | Image Tags [Deleted] | cnt | 1m | Number of deleted Image Tag (digest) |
| Container Registry | Images [Created] | cnt | 1m | Number of generated images |
| Container Registry | Images [Deleted] | cnt | 1m | Number of deleted images |
| Container Registry | Logins [Allowed] | cnt | 1m | Number of allowed Registry Logins |
| Container Registry | Logins [Denied] | cnt | 1m | Number of denied Registry Logins |
| Container Registry | Repositories [Created] | cnt | 1m | Number of created repositories |
| Container Registry | Repositories [Deleted] | cnt | 1m | Number of deleted repositories |
| State | Instance State | state | 1m | Check status |
Networking type
Internet Gateway
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Internet Gateway | Network In Total Bytes [Internet Delta] | bytes | 5m | Internet Gateway → Cumulative traffic volume toward VPC for 5 minutes (Internet) ※ Traffic bps average conversion formula: cumulative traffic volume (bytes) / 300 (seconds) * 8 (bits) |
| Internet Gateway | Network In Total Bytes [Internet] | bytes | 5m | rx bytes total |
| Internet Gateway | Network Out Total Bytes [Internet Delta] | bytes | 5m | VPC → cumulative traffic volume toward the Internet Gateway over 5 minutes (Internet) ※ Traffic bps average conversion formula: cumulative traffic volume (bytes) / 300 (seconds) * 8 (bits) |
| Internet Gateway | Network Out Total Bytes [Internet] | bytes | 5m | tx bytes total |
Load Balancer(OLD)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Load Balancer | Current Connection | cnt | 5m | Current number of connections |
| Load Balancer | Total Connection | cnt | 5m | Total number of connections |
| Load Balancer | Total Connection [Delta] | cnt | 5m | Total number of connections (delta value) |
| Load Balancer | Network In Bytes | bytes | 5m | in bytes |
| Load Balancer | Network In Bytes [Delta] | bytes | 5m | Client → Load Balancer cumulative traffic volume over 5 minutes ※ Traffic bps average conversion formula: cumulative traffic volume (bytes) / 300 (seconds) * 8 (bit) |
| Load Balancer | Network Out Bytes | bytes | 5m | out bytes |
| Load Balancer | Network Out Bytes [Delta] | bytes | 5m | Cumulative traffic volume over 5 minutes from Load Balancer to Client ※ Traffic bps average conversion formula: cumulative traffic volume (bytes) / 300 (seconds) * 8 (bits) |
| Load Balancer | Instance State | state | 5m | Load Balancer status |
Load Balancer Listener(OLD)
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Listener | Connections [Current] | cnt | 5m | Current number of connections |
| Listener | Connections [Total Delta] | cnt | 5m | total connection count (delta value) |
| Listener | Connections [Total] | cnt | 5m | total connection count |
| Listener | Instance State | state | 5m | LB Listener status |
| Listener | Network In Bytes | bytes | 5m | in bytes |
| Listener | Network In Bytes [Delta] | bytes | 5m | Cumulative traffic volume over 5 minutes from Client to Load Balancer ※ Traffic bps average conversion formula: cumulative traffic volume (bytes) / 300 (seconds) * 8 (bits) |
| Listener | Network Out Bytes | bytes | 5m | out bytes |
| Listener | Network Out Bytes [Delta] | bytes | 5m | Load Balancer → Client cumulative traffic volume over 5 minutes ※ Traffic bps average conversion formula: cumulative traffic volume (bytes) / 300 (seconds) * 8 (bits) |
Direct Connect
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Direct Connect | Network In Bytes | bytes | 5m | Cumulative traffic volume from Direct Connect → VPC |
| Direct Connect | Network In Bytes [Delta] | bytes | 5m | Direct Connect → VPC cumulative traffic volume over 5 minutes ※ Traffic bps average conversion formula: Cumulative traffic volume (bytes) / 300 (seconds) * 8 (bits) |
| Direct Connect | Network Out Bytes | bytes | 5m | Cumulative traffic volume from VPC to Direct Connect |
| Direct Connect | Network Out Bytes [Delta] | bytes | 5m | VPC → Direct Connect cumulative traffic volume over 5 minutes ※ Traffic bps average conversion formula: cumulative traffic volume (bytes) / 300 (seconds) * 8 (bits) |
Load Balancer
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| State | Instance State | state | 5m | LB status |
| Load Balancer | Current Connection | cnt | 5m | Current number of connections |
| Load Balancer | Total L4 Connection | cnt | 5m | Total L4 Connection count |
| Load Balancer | Total L7 Connection | cnt | 5m | Total number of L7 connections |
| Load Balancer | Total TCP Connection | cnt | 5m | Total number of TCP connections |
| Load Balancer | Total Connection | cnt | 5m | Total number of connections |
| Load Balancer | Bytes processed in forward direction | bytes | 5m | Full‑duplex Network Byte |
| Load Balancer | Packets processed in forward direction | cnt | 5m | Bidirectional Network packet |
| Load Balancer | Bytes processed in reverse direction | bytes | 5m | Reverse Network Byte |
| Load Balancer | Packets processed in reverse direction | cnt | 5m | Reverse Network packet |
| Load Balancer | Total failure actions | cnt | 5m | Total number of failures |
| Load Balancer | Current Request | cnt | 5m | Current request count |
| Load Balancer | Current response | cnt | 5m | Current Response count |
| Load Balancer | Total Request | cnt | 5m | Total number of requests |
| Load Balancer | Total Request Success | cnt | 5m | Total number of successful requests |
| Load Balancer | Peak Connection | cnt | 5m | Maximum number of connections |
| Load Balancer | Current Connection Rate | % | 5m | Current SSL Connection rate |
| Load Balancer | Last response time | ms | 5m | Last response time |
| Load Balancer | Fastest response time | ms | 5m | Shortest response time |
| Load Balancer | Slowest response time | ms | 5m | Maximum response time |
| Load Balancer | Current SSL Connection | cnt | 5m | Current number of SSL connections |
| Load Balancer | Total SSL Connection | cnt | 5m | Total number of SSL connections |
| Load Balancer | Bytes processed in forward direction [Delta] | bytes | 5m | Forward Network Byte (delta value) |
| Load Balancer | Packets processed in forward direction [Delta] | cnt | 5m | Forward Network packet (delta value) |
| Load Balancer | Bytes processed in reverse direction [Delta] | bytes | 5m | Reverse Network Byte (delta value) |
| Load Balancer | Packets processed in reverse direction [Delta] | cnt | 5m | Reverse Network packet (delta value) |
Load Balancer Listener
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| State | Instance State | state | 5m | LB status |
| Load Balancer | Current Connection | cnt | 5m | Current number of connections |
| Load Balancer | Total L4 Connection | cnt | 5m | Total L4 Connection count |
| Load Balancer | Total L7 Connection | cnt | 5m | Total number of L7 connections |
| Load Balancer | Total TCP Connection | cnt | 5m | Total number of TCP connections |
| Load Balancer | Total Connection | cnt | 5m | Total number of connections |
| Load Balancer | Bytes processed in forward direction | bytes | 5m | Full‑duplex Network Byte |
| Load Balancer | Packets processed in forward direction | cnt | 5m | Bidirectional Network packet |
| Load Balancer | Bytes processed in reverse direction | bytes | 5m | Reverse Network Byte |
| Load Balancer | Packets processed in reverse direction | cnt | 5m | Reverse Network packet |
| Load Balancer | Total failure actions | cnt | 5m | Total number of failures |
| Load Balancer | Current Request | cnt | 5m | Current request count |
| Load Balancer | Current response | cnt | 5m | Current Response count |
| Load Balancer | Total Request | cnt | 5m | Total number of requests |
| Load Balancer | Total Request Success | cnt | 5m | Total number of successful requests |
| Load Balancer | Peak Connection | cnt | 5m | Maximum number of connections |
| Load Balancer | Current Connection Rate | % | 5m | Current SSL Connection rate |
| Load Balancer | Last response time | ms | 5m | Last response time |
| Load Balancer | Fastest response time | ms | 5m | Shortest response time |
| Load Balancer | Slowest response time | ms | 5m | Maximum response time |
| Load Balancer | Current SSL Connection | cnt | 5m | Current number of SSL connections |
| Load Balancer | Total SSL Connection | cnt | 5m | Total number of SSL connections |
| Load Balancer | Bytes processed in forward direction [Delta] | bytes | 5m | Forward Network Byte (delta value) |
| Load Balancer | Packets processed in forward direction [Delta] | cnt | 5m | Forward Network packet (delta value) |
| Load Balancer | Bytes processed in reverse direction [Delta] | bytes | 5m | Reverse Network Byte (delta value) |
| Load Balancer | Packets processed in reverse direction [Delta] | cnt | 5m | Reverse Network packet (delta value) |
Load Balancer Server Group
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Server Group | Instance State | state | 5m | LB Server Group status |
| Server Group | Peak Connection | cnt | 5m | Maximum connections per server group |
| Server Group | Healthy host | cnt | 5m | Number of healthy hosts in server group |
| Server Group | Unhealthy host | cnt | 5m | Number of abnormal hosts in server group |
| Server Group | Request Count | cnt | 5m | Number of requests |
| Server Group | Response Count | cnt | 5m | Response count |
| Server Group | 2xx Response Count | cnt | 5m | 2xx response count |
| Server Group | 3xx Response Count | cnt | 5m | Number of 3xx responses |
| Server Group | 4xx Response Count | cnt | 5m | 4xx response count |
| Server Group | 5xx Response Count | cnt | 5m | Number of 5xx responses |
Cloud WAN
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| State | Instance State | state | 10m | Attachment connection status |
| Attachment | Network in bytes | bytes | 10m | in bytes(Inbound traffic usage per interval) |
| Attachment | Network out bytes | bytes | 10m | Out bytes(Outbound traffic usage per interval) |
| Attachment | Network In Packets [Dropped] | cnt | 10m | in Dropped Packet count (number of dropped packets per interval) |
| Attachment | Network Out Packets [Dropped] | cnt | 10m | Out Dropped Packet count (number of dropped packets per interval) |
| Attachment | Network In Packets [Unicast] | cnt | 10m | in Unicast Packet count (number of Unicast packets per cycle) |
| Attachment | Network Out Packets [Unicast] | cnt | 10m | Out Unicast Packet count (Unicast packets per cycle) |
| Attachment | Network In Packets [Broadcast] | cnt | 10m | in Broadcast Packet count (number of Broadcast packets per cycle) |
| Attachment | Network Out Packets [Broadcast] | cnt | 10m | Out Broadcast Packet count (number of broadcast packets per cycle) |
| Attachment | Network In Packets [Multicast] | cnt | 10m | in Multicast Packet count (Multicast packets per cycle) |
| Attachment | Network Out Packets [Multicast] | cnt | 10m | Out Multicast Packet count (Multicast packet count per cycle) |
| Attachment | Network In Error Packets | cnt | 10m | in Error Packet count (number of received error packets per cycle) |
| Attachment | Network Out Error Packets | cnt | 10m | Out Error Packet count (number of transmitted error packets per cycle) |
Global CDN
| Performance Item Group Name | Performance item name | collection unit | Collection interval | Explanation |
|---|---|---|---|---|
| Global CDN | Instance State | state | 5m | Global CDN status |
| Global CDN | Data Transfer Bytes | bytes | 5m | Data transfer volume transmitted via CDN service (originBytes) |
| Global CDN | Requests [Total] | cnt | 5m | Number of service requests (cases) received by the CDN service (originHits) |
2.9 - Appendix C. Service-specific Status Checks
Compute type
Virtual Server
| Performance item name | Explanation | value |
|---|---|---|
| Instance State [Basic] | Instance status | NOSTATE, RUNNING, BLOCKED, PASUED, SHUTDOWN, SHUTOFF, CRASHED, PMSUSPENDED, LAST |
GPU Server
| Performance item name | description | value |
|---|---|---|
| Instance State [Basic] | Instance status | NOSTATE RUNNING, BLOCKED, PASUED, SHUTDOWN, SHUTOFF, CRASHED, PMSUSPENDED, LAST |
Bare Metal Server
| Performance item name | description | value |
|---|---|---|
| N/A | N/A | N/A |
Multi-node GPU Cluster [Cluster Fabric]
| Performance item name | description | value |
|---|---|---|
| N/A | N/A | N/A |
Multi-node GPU Cluster [Node]
| Performance item name | description | value |
|---|---|---|
| N/A | N/A | N/A |
Storage type
File Storage
| Performance item name | description | value |
|---|---|---|
| Instance State | File Storage Volume Status | 1: When Online |
| 0: Other status values (Offline) |
Object Storage
| Performance item name | description | value |
|---|---|---|
| N/A | N/A | N/A |
Block Storage(BM)
| Performance item name | Explanation | value |
|---|---|---|
| Instance State | Blockstorage volume status | 1: running (normal) * 0: down (abnormal) |
Block Storage(VM)
| Performance item name | description | value |
|---|---|---|
| Instance State | Blockstorage volume status | 1: running (normal) * 0: down (abnormal) |
Database type
PostgreSQL(DBaaS)
| Performance item name | Explanation | value |
|---|---|---|
| Instance State [PID] | postgres process PID | PID: postgres when the process exists * -1: when the process does not exist |
MariaDB(DBaaS)
| Performance item name | description | value |
|---|---|---|
| Safe PID | mariadb_safe process PID | PID: mariadb_safe when the process exists -1: when the process does not exist |
| Instance State [PID] | mariadb process PID | PID: mariadb if the process exists * -1: if the process does not exist |
MySQL(DBaaS)
| Performance item name | description | value |
|---|---|---|
| Instance State [PID] | mysqld process PID | PID: mysqld process exists -1: process does not exist |
Microsoft SQL Server(DBaaS)
| Performance item name | description | value |
|---|---|---|
| Instance State [Cluster] | MSSQL cluster configuration status | PID: mssql when the process exists -1: when the process does not exist |
| Instance State [PID] | sqlservr.exe process pid | For Microsoft SQL Server, the secondary server also has a PID running, so the status cannot be determined solely by the PID. |
EPAS(DBaaS)
| Performance item name | description | value |
|---|---|---|
| Instance State [PID] | postgres process PID | PID: postgres if the process exists * -1: if the process does not exist |
CacheStore(DBaaS)
Redis
| Performance item name | Explanation | value |
|---|---|---|
| Instance State [PID] | Redis-server process PID | -1: If the process does not exist |
| Sentinel State [PID] | Sentinel process PID | -1: when the process does not exist |
Valkey
| Performance item name | description | value |
|---|---|---|
| Instance State [PID] | Valkey-server process PID | -1: If the process does not exist |
| Sentinel State [PID] | Sentinel process PID | -1: when the process does not exist |
Data Analytics type
Event Streams
| Performance item name | description | value |
|---|---|---|
| AKHQ State [PID] | akhq process PID | PID: akhq if the process exists * -1: if the process does not exist |
| Instance State [PID] | Kafka process PID | PID: when the kafka process exists * -1: when the process does not exist |
| Zookeeper State [Pid] | zookeeper process PID | PID: zookeeper if the process exists * -1: if the process does not exist |
Search Engine
| Performance item name | Explanation | value |
|---|---|---|
| Instance State [PID] | Elasticsearch process PID | PID: if the Elasticsearch process exists * -1: if the process does not exist |
| Kibana State [PID] | Kibana process PID | PID: Kibana if the process exists * -1: if the process does not exist |
Elasticsearch
| Performance item name | Explanation | value |
|---|---|---|
| Instance State [PID] | Elasticsearch process PID | -1: when the process does not exist |
| Kibana State [PID] | Dashboard process PID | -1: if the process does not exist |
Opensearch
| Performance item name | description | value |
|---|---|---|
| Instance State [PID] | Opensearch process PID | -1: If the process does not exist |
| Dashboard State [PID] | Dashboard process PID | -1: when the process does not exist |
Vertica(DBaaS)
| Performance item name | Explanation | value |
|---|---|---|
| Instance State [PID] | Vertica process PID | -1: if the process does not exist |
Container type
Kubernetes Engine
| Performance item name | description | value |
|---|---|---|
| Instance State | cluster status | 1: If the health check query sum(up{job=““kubernetes-apiservers””}) returns a value greater than 0 |
- 0: If the health check query sum(up{job=““kubernetes-apiservers””}) returns a value less than or equal to 0 |
Container Registry
| Performance item name | description | value |
|---|---|---|
| Instance State | Container Registry status | 1: running (normal) * 0: down (abnormal) |
Networking type
Internet Gateway
| Performance item name | description | value |
|---|---|---|
| N/A | N/A | N/A |
Load Balancer(OLD)
| Performance item name | description | value |
|---|---|---|
| Instance State | Load Balancer status | Determine based on provisioning_status in the API response * 1: ACTIVE * 0: ETC |
Load Balancer Listener(OLD)
| Performance item name | Explanation | value |
|---|---|---|
| Instance State | Load Balancer Listener status | Determine based on provisioning_status in the API response * 1: ACTIVE * 0: ETC |
Load Balancer
| Performance item name | description | value |
|---|---|---|
| Instance State | Load Balancer status | Determine based on provisioning_status in the API response * 1: ACTIVE * 0: ETC |
Load Balancer Listener
| Performance item name | description | value |
|---|---|---|
| Instance State | Load Balancer Listener status | Determine based on provisioning_status in the API response * 1: ACTIVE * 0: ETC |
Load Balancer Server Group
| Performance item name | Explanation | value |
|---|---|---|
| Instance State | Status of Load Balancer Server Group | Determine based on provisioning_status in the API response * 1: ACTIVE * 0: ETC |
Direct Connect
| Performance item name | description | value |
|---|---|---|
| N/A | N/A | N/A |
Cloud WAN
| Performance item name | description | value |
|---|---|---|
| Instance State | Attachment connection status | 0: down * 1: up * 2: testing * 3: unknown |
Global CDN
| Performance item name | description | value |
|---|---|---|
| Instance State | Global CDN status | 1: running (normal) * 0: down (abnormal) |
3 - API Reference
4 - Release Note
Cloud Monitoring
- In July 2025, we added an integrated service with Cloud Monitoring.
- Additional integrated services: Compute(Multi-node GPU Cluster [Cluster Fabric], Multi-node GPU Clutser [Node]), Storage(Block Storage(BM), Block Storage(VM)), Networking(Cloud WAN, Global CDN), Database(Valkey), Data Analytics(Opensearch, Vertica(DBaaS))
- In February 2025, we added an integration service with Cloud Monitoring.
- Additional integrated services: Container (Container Registry), Database (EPAS, Microsoft SQL Server), Data Analytics (Event Streams, Search Engine), Networking (Load Balancer, Load Balancer Listener, Load Balancer Server Group, VPN)
- We have launched the Cloud Monitoring service. It collects usage status and change information of operational infrastructure resources, and supports a stable cloud operating environment by generating and notifying events when configured thresholds are exceeded.