The page has been translated by Gen AI.

Usage Management and Optimization

Usage Management and Optimization

Existing workload usage prediction and management

Periodic usage increase prediction and management

Generally, as cloud usage increases, costs also increase.

Except for special economic situations, most businesses grow over time and data continues to accumulate, so the demand for cloud resources also increases.

If these resource demands are not properly managed, the cost is likely to be uncontrolled and increase.

Therefore, it is necessary to periodically predict cloud usage and take appropriate measures accordingly.

Review Cycle Definition 먼저 클라우드 자원의 검토 주기를 정의하고, 해당 주기에 따라 시간과 자원을 할당하여 비용 최적화를 수행합니다. -> First, define the review cycle of cloud resources, and allocate time and resources according to the cycle to perform cost optimization.

When setting the review cycle, you can refer to the discount contract cycle of cloud resources.

For example, Samsung Cloud Platform provides contract discounts for 1 year or 3 years, Cost Savings, Planned Compute.

Setting a resource adjustment cycle of three months or semi-annual basis based on the expiration date of this agreement is advantageous for establishing a resource operation plan.

Additionally, if the review cycle is matched with the company’s accounting period, it can also be effectively used for organization-based performance evaluation.

When setting the review cycle, software license agreements, operation management contract periods, etc. should also be considered.

For example, the entire resource can be reviewed on a 12-month cycle, and the data repository can be reviewed on a 6-month cycle.

Usage Increase Prediction The factor that has the greatest impact on cloud costs is the increase in usage.

The resources must be adjusted to reflect the impact of increased usage on the currently operating cloud environment.

You can perform predictions through the following procedure.

  1. Evaluate whether all expenditure items are properly classified according to the organization’s departments and processes, and make corrections as necessary.

  2. It calculates the average expenditure for each group over the past 3 months.

  3. It calculates the average expenditure compared to the same period last year.

  4. Compare the two averages to understand the trend and reflect the growth rate of each group.

  5. We adjust the amount of resources and the rate system to reflect future plans.

  6. Shares with FinOps personnel and coordinates opinions.

In the 5th step of the plan reflection process, the rate plan is adjusted by reflecting the newly increased or decreased resource usage, and the quantity is adjusted.

Related FinOps cost modeling is discussed in detail in III. Establishing and executing FinOps strategies 1.3 Cost modeling.

Review of new services, features, and configuration Samsung Cloud Platform, like most cloud providers, is continuously adding new technologies and services.

Some of these enable new business experiments, and some contribute to improving the performance of existing resources.

To maintain the workload in a cost-effective manner, it is necessary to regularly review the possibility of introducing new services, features, and components.

External influence-based usage prediction and management

Analyzing external factors that affect cloud usage is also important.

To do this, it understands the pattern and characteristics of computing work, and defines response time as a key performance indicator to determine whether demand varies.

Additionally, analysis of the predictability, repeatability, speed of change, and scale of external influences is also necessary.

The analysis period is set based on a sufficient period (more than 1 year) that can consider seasonality (Seasonality).

Through this analysis, resources can be adjusted according to the predicted impact, and the cost-effectiveness thereof can be evaluated.

  • Workload Type We identify the type of task of the target system for analysis. For example, e-commerce, internal work systems, machine learning services, etc. have different required performance and resource characteristics, so we identify the necessary resources through workload analysis.

  • Usage Rate and Performance Indicators Analyze the change in resource usage and derive resource adjustment measures when used to the maximum/minimum. At this time, performance indicators such as response time and delay time are used together.

  • Request Load Type Analyzing the traffic pattern of the request load determines the direction of resource adjustment according to whether the workload is database transaction-centered or content delivery-centered.

To make decisions necessary for usage prediction and resource adjustment, you can perform the following tasks.

  • It uses log files and monitoring data extracted from monitoring tools, including Cloud Monitoring, to gain insights into workloads. Obtain data on periodic changes, and also understand the trend of fluctuation and increase/decrease in demand.

  • We collaborate with departments that can affect demand to check if an event occurs.

Resource Optimization

Resource type, size, quantity adjustment

Resource optimization is all about striking a balance between the two goals of cost reduction and service stability.

The resizing work that optimizes the size and quantity of resources requires a data-based strategic decision, not a simple technical adjustment, and the things to be reviewed for this are as follows.

  • Considering Resource Attributes and Costs Optimally selecting the type, size, and quantity of resources allows you to meet technical requirements at the minimum cost. To optimize costs, when performing resizing, you should comprehensively consider not only all resources included in the workload and the properties of each resource, but also the labor costs incurred for the adjustment work. If the labor cost required for resizing is higher than the cost that can be saved, it is desirable to perform it once at the time of service change or discontinuation, rather than repeating the work regularly. To adjust the size of the resource, visibility of how much of the current resource is being used must be secured. This visibility includes CPU usage rate, memory usage rate, network throughput, disk usage rate, etc., and based on this data, server type and disk capacity can be defined.

  • Resizing for Resource Optimization Resizing should not be performed solely for the purpose of cost reduction, and caution should be taken to ensure that it does not have a negative impact on service operations. The main goal of the operations team is to stably maintain the operating capacity required for the service. Especially, the task of adjusting the size of resources that support commercial applications is highly complex due to licensing issues, etc., so a careful approach is needed. Resources have low and high costs, and focusing optimization efforts on high-cost resources can be more efficient. As mentioned earlier, if the labor cost invested in resizing is greater than the resource cost that can be saved, the task may be inefficient. It is desirable to determine whether to resize based on the limitations of the amount that can be reduced, and if you focus on a high-spec database rather than a low-spec virtual server, you can expect a greater cost reduction effect.

  • Data-based Cost Optimization Strategy Data-based resizing can be used for capacity adjustment in the Scale-up method, and the Scale-out method can also be reviewed together. This approach is a concept to extend computing nodes, and can apply both manual tuning and automatic tuning strategies. Examples of manual adjustment include increasing the number of Virtual Servers running in the Load Balancer, or adjusting the number of Node pools or replica sets of pods in the Kubernetes Cluster. By utilizing indicators such as the average CPU usage in these tasks, automation is also possible. By specifying the minimum and maximum number of worker nodes, the capacity range of computing resources required for workload processing can be set, and cost optimization can be achieved by dynamically adjusting the capacity according to the metrics.

Idle resource disposal

Among the tasks that must be performed for resource optimization in the cloud, idle resource management is the most important and effective task.

Resources may have been created by necessity, but after some time has passed, they are no longer used, however, idle resources that are not deleted or reduced may still be maintained.

These measures for such idle resources are essential and can have the greatest effect on reducing cloud costs.

  • Idle Resource Management Procedure In a cloud environment, idle resources that are not used may occur over time. These resources should be deleted, but if management is not performed, they can be left unattended due to the administrator’s negligence or lack of resource management visibility. For resources that are not created directly by the administrator, it may be difficult to perform deletion operations, and there is also a possibility that they contain important data or are being preserved for a specific purpose. Therefore, in order to discard idle resources, a method to check the usage history of resources and a discard procedure are required.

  • Idle Resource Lifecycle Management First, you must manage the tags related to the lifecycle of the resource. From a lifecycle perspective, the resource should identify information about whether the resource is for testing and when the testing is completed. If it is a resource using a license with an expiration period, this information should also be identifiable by the administrator. This information can be implemented through tags, and by establishing a tag policy related to the life cycle of resources, you can determine the purpose and expiration period of resources, etc. Subsequently, a process for resource disposal must be established, and all resources must include information about the relevant department and person in charge. Before discarding resources, stakeholders must be notified in advance and a confirmation procedure must be carried out to prevent data loss. If resources are managed according to the importance of tasks, it is also possible to implement automatic disposal or deletion automation for resources with low importance.