The page has been translated by Gen AI.

성능 효율성을 위한 설계

성능 효율성을 위한 설계

One of the frequently used analogies when explaining system performance is the flow of water through a pipe.

If we liken water flow to performance, the amount of flow is determined by the water’s speed and the pipe’s diameter.

The system’s performance is also similar. The speed of the water corresponds to response time, and the thickness of the pipe corresponds to the number of tasks that can be processed simultaneously.

Conceptual Diagram
Fig. Information System Performance

If viscous oil rather than water flows through the pipe, the amount flowing over the same time period will differ, even if a pipe of the same thickness is used.

Similarly, in information systems, processing speeds may vary depending on the characteristics of the tasks being processed, and performance inevitably varies as a result.

Therefore, the required level of performance may vary depending on user characteristics and environment, and user satisfaction may differ even with identical performance.

This means that there is no standard that can be applied uniformly to all systems and users.

Considering these points, ‘performance’ can be defined as the amount of work that can process user requests within the expected time for specific tasks.

In traditional IT environments, performance management focused on maximizing throughput within limited physical resources, and the primary task was optimizing individual components of information systems.

However, the recent IT environment focuses on efficiently meeting increasing service demands and technical requirements by leveraging Application infrastructure and resources based on cloud technology, rather than on partial optimization.

Considerations for Performance Efficiency

Considering Cost and Performance Trade-offs

Deploying resources in advance or over-provisioning to meet performance requirements may result in unexpected costs.

Also, when adopting an over-provisioning strategy, if demand spikes and exceeds the limits of the pre-provisioned resources, resource adjustments become necessary during the incident response phase.

Incorporating a demand-based resource scaling strategy into architecture design enables the flexibility to dynamically resize workload components, which contributes to enhancing the performance efficiency of cloud workloads.

모범 사례
We design an architecture that reduces investment in items not required by functional requirements and can increase performance as needed.

Auto-Scaling is an effective means of horizontally scaling computing resources based on demand, but if scaling policies are not properly configured, unnecessary scaling may occur, or sufficient performance may not be provided when needed.

By analyzing demand load and establishing policies that match its characteristics, you can reduce cost waste and secure the required computing performance.

Implementing a caching strategy also requires sophisticated design.

Caching is a technology for delivering frequently used content with low latency. However, if content is not frequently used or the caching retention period (Time-To-Live, TTL) is not suitable for the use case, it may incur unnecessary costs or fail to provide latency improvements.

설계 원칙
  1. Configure the Auto-Scaling policy to match user demand.
  2. Improve performance efficiency through proper caching implementation.

Selecting a Cloud Service Suitable for Your Requirements

Select appropriate cloud services based on the workload’s performance goals and future capacity requirements.

  • Select a service that meets the performance goals of the requirements. Samsung Cloud Platform offers various computing options such as Virtual Machine, Bare Metal, and Serverless Computing. For example, when you need to build a database that requires high‑availability, high‑performance OLTP, instead of deploying the database on a Virtual Server or using a DBaaS service, you can configure redundant database servers on two Bare Metal Servers and connect Block Storage (BM) using a Multi‑Attach method.

  • Meets compliance and constraint requirements. While the cloud offers various services, choices may be limited due to specific regulations or restrictive requirements. For example, even if you wish to use DBaaS for performance and operational aspects, using the service may be difficult if there are compliance requirements to apply a specific encryption module not supported by the DBaaS when storing data.

  • Considers organizational capabilities. Microservice Architecture can be a strategy to maximize efficiency in terms of performance and operations, but implementing it requires the organization’s technical capabilities, processes, and culture to support it. Attempting technical innovation without preparation increases the likelihood of failure.

Designing the Overall Service Flow Considering Performance

A flow refers to a series of processes for performing a specific task within a workload.

The series of processes where a user request event is generated, a message is sent to the server to handle it, and the response is processed to send a reply can be called a flow.

To optimize the performance of information systems, it is important to understand all flows. By analyzing workloads in individual flow units, you can identify bottlenecks or inefficiencies in resource usage.

To do this, apply analysis and tracing tools to each component, set performance metrics, and collect data for a specific period.

Based on the collected data, we identify critical flows and prioritize them for performance improvement.

Important flows refer to the customer’s primary user flows, or system and data flows corresponding to core tasks within the workload.

Like the Business Impact Analysis covered in the Reliability Design Principles (For more details, see Reliability Design Principles > I. Business Impact Analysis and Recovery Objective Definition > 1. Business Impact Analysis (BUSINESS IMPACT ANALYSIS)), we assess business impact to identify critical flows, analyze performance metrics, and establish improvement objectives.

Provide dedicated resources and sufficient capacity for identified critical flows to ensure a stable computing environment.

  • Dedicated resource configuration for critical flows Critical flows must be configured to operate independently without interference from other processes, and a separate VPC or subnet can be utilized for this purpose.

  • Software-level flow isolation Separate flows at the VM (Virtual Machines) or container level to minimize interference from other flows.

  • Securing dedicated resources and capacity Minimize resource sharing and allocate dedicated resources or capacity for critical flows to ensure stable execution.

The following is an example of an issue that occurred in a critical flow.

A certain company operates a high-availability 3-tier website architecture for online recruiting.

However, the HR team is experiencing issues with application submissions due to sudden traffic spikes during specific periods.

Tens of thousands of applicants flock during the recruiting season, but system load is significantly lower during the off-season.

In an on-premises environment, you must purchase additional equipment to handle this load. However, after the recruitment season, this equipment becomes idle, leading to increased management overhead and reduced cost efficiency.

These problems can be addressed using a cloud environment.

Configuration Diagram
※ Multi-AZ is scheduled for release in the future (2026)

Implemented Auto-Scaling for the workloads of the main flow, ❶Web Server and ❷Application Server, and implemented DBaaS high availability for the ❸database.

In contrast, the non-primary flow ❹Bastion Server and ❺Standalone Application server are configured as standalone servers.

Cloud Performance Improvement Areas

Latency

Latency refers to the time it takes to transmit data over a network.

If the distance between the end user and the information system is close and the system’s response speed is fast, the latency is short. Conversely, if the distance is far or response times are slow, latency increases.

When network latency increases, Application performance degrades, and if it exceeds a certain threshold, it can lead to system failure.

Conceptual Diagram
Figure. Response time and latency

Network latency occurs due to various factors.

First, latency may occur due to network lines and intermediate equipment such as routers between the request location and the receiving location.

Routers that data packets pass through from source to destination are called hops, and latency increases as the hop count increases.

Additionally, delays can occur within the data center due to perimeter security devices such as firewalls and internal network equipment.

Delays can also occur due to the operating mechanisms within the server farm, between the web, application, and database, or within the server itself.

In addition, delays may occur when the Application interfaces with other servers to return a response.

The most basic way to reduce latency is to reduce the network distance between the client and the server.

You can place the server in a region that is geographically close to the client, or configure a CDN to handle responses directly on edge servers.

The following figure shows the inbound traffic latency of a website containing various content.

As the number of network hops decreases, the latency on the far right decreases.

Conceptual Diagram
Figure. Webpage Content Latency

Throughput

Along with designs that reduce latency by shortening the distance between users and servers, increasing network throughput is also one of the ways to improve network performance.

Throughput means the amount of work processed per unit time.

Throughput and latency are closely related.

Being able to transmit more data in a short time means low latency and high throughput.

Network throughput is expressed as the amount of data transmitted per second (Bytes/second or bits/second).

At the operating system level, throughput is determined by the amount of data transferred per second between the CPU and memory. In the database field, this is expressed as the number of transactions processed per second (operations/second or Transactions/second).

When defining performance requirements, throughput requirements are expressed as the number of concurrent users and concurrent throughput, which is further covered in II. Computing Design > 1. Computing Services, Server Types, and Sizing.

The following are approaches to consider for increasing throughput.

  • Transmission Server Extension You can increase the overall service throughput by placing computing nodes that perform the same task in parallel behind a load balancer.

  • Change transmission method Implementing Application connections using an API-based method instead of a session-based approach can improve resource utilization.

  • Hybrid connection configuration If Direct Connect is required, you can review hybrid connections to select the appropriate bandwidth.

Capacity Planning

Capacity planning is the process of making decisions regarding resource capacity by considering future workload requirements and usage patterns.

Predict usage fluctuations based on business schedules, such as seasonal changes or new product launches, and reflect them in capacity planning.

These proactive strategies can prevent service outages and improve performance efficiency.

By analyzing past usage trends and growth data, you can predict short-term and long-term capacity requirements and identify bottlenecks and Auto-Scaling issues in advance to ensure consistent workload performance.

  • Analyze data accumulated over a long period. Analyze utilization rates, performance data, and workload pattern logs accumulated for over a year to identify seasonal and cyclical demand, and incorporate the load during demand surge periods into capacity planning.

  • Identify bottlenecks. Configure a test environment separate from the production environment, generate load to measure and improve bottlenecks, and enhance overall performance.

  • Implements auto-scaling. Configure automated scaling instead of manual scaling.

Configure schedule-based Auto-Scaling, or leverage the managed services of cloud providers to utilize built-in capacity scaling.

Defining, Measuring, and Improving Performance Goals

Defining Performance Goals, Measuring, and Establishing an Improvement Process

Performance requirements are essential elements required to provide optimized information system services to users and ensure stable operation and maintenance.

It specifically describes how quickly and efficiently the system can process when performing a function.

Additionally, performance requirements describe the time required, throughput, and maximum resource usage when performing functions under specific conditions.

Performance requirements are important because they significantly impact the system quality perceived by end users.

System processing speed, screen response time, page errors, and downtime can be critical factors of dissatisfaction in service level management.

Therefore, these items must be explicitly specified in the request for proposal, and the performance targets must be met by linking equipment performance testing during system implementation.

The performance requirement items are as follows, and specific performance goals are established for each item.

CategoryItemRequirement itemsExample
General PerformanceGeneral performanceGeneral performance
  • Performance analysis tool
  • Test plan
Processing speed and time performance requirementsResponse timeOnline task response timeThe initial result must be responded to within 3 seconds of the user request.
Processing speed and time performance requirementsResponse timeOnline batch job response timeYou must respond with the result within 3 minutes for online batch work requests.
Processing speed and time performance requirementsResponse timeBatch task response timeThe daily batch job must be processed within 10 minutes.
Processing speed and time performance requirementsResponse timeWeb page display timeEach web page must be displayed within a few seconds.
Processing speed and time performance requirementsResponse timeError response timeAll error messages must be displayed within 3 seconds after the information is entered.
Throughput requirementsConcurrent user countNumber of concurrent usersIt should support an average of at least 200 concurrent users without performance degradation.
Throughput requirementsconcurrent processing capabilityconcurrent processing capabilityThe system must process 50 user basic information entries per second under maximum load.
Resource usage requirementsCPU usageCPU usageThe average CPU utilization during service uptime must not exceed 60%.
Resource usage requirementsMemory usageMemory usageThe average memory usage during service uptime must not exceed 60%.
표. 성능 요구 사항

Existing information systems used the USE methodology for performance measurement, but the RED methodology is also used with the recent proliferation of cloud-native application development.

The USE Method

A methodology proposed by Brendan Gregg, used to analyze system bottlenecks in the early stages of performance review.

The USE methodology can be defined as follows.

“Check utilization, saturation, and errors for all resources.”

The definitions of each term are as shown in the table below.

Termdefinitionexample
Resource
(Resource)
All physical server componentsCPU, Disk, Memory, Network, etc.
Utilization
(usage rate)
The proportion of time a resource spent performing work during a specific periodDisk usage=90%
Saturation
(Saturation)
Additional work that the resource could not processCPU average run queue length=4
Errors
(Error)
Number of errors that occurred50 late collisions occurred on this network interface.
표. USE 방법론

The procedure for the USE methodology is as follows.

Conceptual Diagram
Figure. USE flow

The USE methodology is performed by first checking for errors according to the flowchart, then sequentially confirming utilization and saturation.

Generally, since errors can be intuitively verified, checking for errors first and then analyzing the remaining items is effective for saving time.

For example, if CPU usage is 100%, that point is likely a bottleneck.

When checking these metrics, you should also consider their update cycle.

The minimum metric update interval for Cloud Monitoring is 1 minute, and it checks whether a VM’s 100% utilization occurs continuously or only temporarily.

The USE methodology can be applied mechanically without a deep understanding of the software and has been effectively utilized in cloud environments as well as on-premises environments.

This methodology primarily monitors physical hardware resources, and software refers to basic resources commonly used in most systems.

Therefore, it does not rely on software logic and can be applied regardless of the software used.

However, there are limitations in applying this to Microservice architecture that uses partitioned physical resources.

In these cases, using the RED methodology is appropriate.

The RED Method

This is a method proposed by Tom Wilkie for performance analysis in microservice environments. (The RED Method: How to Instrument Your Services | Grafana Labs)

“For all services, monitor rate (processing rate), errors (error count), duration (processing time).”

The metrics of the RED methodology consist entirely of request-based items.

Each term is defined as follows:

TermdefinitionExample
Rate
(Number of requests)
Requests per secondDisk usage = 90%
Errors
(Error)
Failed request countCPU average run queue length = 4
Duration
(processing time)
Request processing time50 late collisions occurred on this network interface.
표. RED 방법론

If the USE methodology is hardware-centric, the RED methodology focuses on service request-centric metrics.

The RED method is useful for analyzing response latency or errors in web applications within a microservice environment.

The RED methodology can also be applied easily without dismantling or analyzing the internal structure of hardware or software.

However, since Cloud Monitoring cannot collect RED metrics, using tools such as Prometheus is effective.

Continuous Optimization and Improvement

모범 사례
Achieve and maintain performance efficiency targets through continuous performance optimization.

Continuous performance optimization refers to a series of processes that monitor and analyze system performance and continuously perform improvement activities.

The goal of performance efficiency is to provide responses within the time expected by users by adjusting resources according to changes in demand.

The performance of information systems can degrade over time.

Therefore, various variables both inside and outside the system, such as demand fluctuations and the complexity resulting from increased functions and interfaces, must be considered.

To achieve performance efficiency goals even amidst continuous change, the following optimization and improvement strategies are required.

설계 원칙
  1. Review and implement new cloud technologies.
  2. Specify the improvement priority.
  3. Automate performance optimization.
  • Review and apply new cloud technologies. Cloud providers continuously introduce new technologies to infrastructure and software platforms. Therefore, you should regularly review and apply these technologies. If technical support or updates for previous versions of the platform are discontinued, it may negatively impact security and availability, as well as performance.

  • Prioritize improvements. Over time, technologies that were optimized at the time of construction may become inefficient. For example, while a specific query served as a critical flow in the database, other queries may become important due to trend changes. Initially, you may have achieved performance goals by optimizing resources and queries for a specific flow. However, if queries and resource planning are not optimized for newly concentrated loads as trends change, it can lead to overall performance degradation. In such cases, you need to adjust priorities and change query and resource capacity planning.

  • Automates performance optimization. Automation eliminates repetitive and time-consuming manual processes, reducing the likelihood of errors and ensuring consistency. To achieve this, we apply automation to tasks such as performance testing, deployment, and monitoring. Even when applying the USE or RED method, you must set thresholds for the target performance metrics and configure automatic alerts to be sent to the administrator immediately when a specific event occurs. Additionally, we establish a plan in advance to enable rapid response via automated scripts in the event of an emergency.