컴퓨팅 설계
컴퓨팅 설계
Computing Services, Server Types, and Sizing
Choosing the Right Compute Service for Your Workload
The computing service specifications provided by Samsung Cloud Platform are as follows.
| Product | Type | CPU | Memory | Option | Option |
|---|---|---|---|---|---|
| Virtual Server | Standard | 1/2/4/6/8/10 vCore | 2~160GB | Max Network Bandwidth | 10Gbps |
| Virtual Server | Standard | 12/14/16 vCore | 24~256GB | Max Network Bandwidth | 12.5Gbps |
| Virtual Server | High Capacity | 24/32/48/64/72/96/128 vCore | 48~1,536GB | Max Network Bandwidth | 25Gbps |
| GPU Server | A100(80G) | 16/32/64/128 vCore | 240~1,920GB | GPU | 1~8 |
| GPU Server | H100(80G) | 12/24/48/96 vCore | 240~1,920GB | GPU | 1~8 |
| Bare Metal Sever(3thGen) | 16/32/64/96/128 vCore | 64~2,048GB | Physical CPU | 8~64 |
Note: You can check the latest server types by visiting the website below. Virtual Server : https://cloud.samsungsds.com/serviceportal/services/compute/virtualServer.html GPU Server : https://cloud.samsungsds.com/serviceportal/services/compute/gpuServer.html Bare Metal Server : https://cloud.samsungsds.com/serviceportal/services/compute/baremetal.html
Virtual Server Virtual Server provides a Standard (s1) type of up to 16 vCores and a High Capacity (h2) type of 24 vCores or more. The Standard type uses an Intel Ice Lake CPU, and 1vCore/2GB is the minimum specification. From 2 vCore up to 16 vCore, they are offered in CPU:Memory ratios of 1:2, 1:4, 1:8, and 1:16. The High Capacity type uses Intel Sapphire Rapids CPUs and offers CPU:Memory combinations in 1:2, 1:4, 1:8, and 1:12 ratios, ranging from 24 vCore to 128 vCore. The operating systems include RHEL, Ubuntu, Alma, Rocky, Oracle Linux, and Windows Server, and you can configure Kubernetes images, Data Service Console images, and so on. Virtual Server can be utilized for various purposes, such as development, testing, and Application execution, depending on the user’s computing needs.
Bare Metal Server A Bare Metal Server is a high‑performance cloud computing service that does not use virtualization technology and allocates physically isolated computing resources such as CPU and memory exclusively. The 3rd generation service using Intel Sapphire Rapids is currently available. The CPU:Memory combinations for the provided server types are available in 1:4, 1:8, and 1:16. The default Internal Disk for the OS is 480GB2 for 16 vCore, 960GB2 for 32 vCore, and 1.92TB*2 for 96/128 vCore. Bare Metal Servers are suitable for workloads requiring high capacity and high performance, such as real-time systems, HPC (High Performance Computing), and servers requiring heavy I/O usage. Additionally, you can leverage the Multi-Attach feature to configure databases that require Active-Active high availability.
Server Sizing
After selecting a computing service suitable for the workload, you must determine the server specifications and quantity based on availability and performance requirements.
While determining server specifications and quantities was a critical process in on-premises environments, in cloud environments, it becomes a flexible task that can be changed at any time.
This is because adjustments can be made later, even if there is a difference between the initially set specifications and the actual required specifications.
Nevertheless, server sizing is important because we must calculate the workload operating cost (monthly fee) in the cloud and, based on that, derive the TCO (Total Cost of Ownership) compared to on‑premises deployment.
To estimate the hardware scale of an information system, the following three methodologies can be considered.
| Category | Concept | Advantage | Disadvantage |
|---|---|---|---|
| Formula calculation method | Method for calculating capacity values based on factors such as user count for sizing estimation, and applying correction factors. | It can clearly present the basis for scale estimation and can be calculated more simply than other methods. | If the correction factor is incorrect, a large discrepancy from the desired value occurs, and providing accurate supporting data for the correction factor is difficult. |
| Reference Method | Depending on the workload (number of users, DB size), estimate a comparable system size by comparing approximate scales using the baseline data. | Since it can be compared with the existing implemented business system, a relatively safe scale estimation is possible. | Because it relies on comparison rather than a calculative approach, the justification is weak. |
| simulation method | Model the workload for the target task, simulate it, and estimate its scale. | You can obtain relatively accurate values | It is time-consuming and costly. |
Calculation formulas and reference methods estimate the resource usage of servers deployed on the cloud by extracting various metrics.
Generally, cloud capacity planning identifies the capacity balance point by adjusting through simulation methods or operations.
However, sizing is often required for reasons such as cost estimation or proposals.
The formula-based calculation method can provide objective capacity design criteria because it calculates server capacity using various metrics.
Web/WAS CPU Sizing Based on Formula Calculation
First, calculate the CPU capacity of the Web/WAS server using the formula.
| Calculation items | Calculation basis | Scope | default value |
|---|---|---|---|
| Number of concurrent users | Users who simultaneously use software or systems over a network | - | estimated value |
| Number of operations per user | Number of business logic operations generated per second by a single user | 3 ~ 6 | 5 |
| Basic OPS correction | Correction factor for adapting the OPS (Operations Per Second) measured in the test environment to a complex real-world environment (default OPS correction factor of 3 applied) | - | 3 |
| Business-use adjustment | Correction factor based on the type of target system (0.7: web server, 2: WAS server) |
| |
| Interface load compensation | Correction factor that accounts for the load generated by the interface when communicating between servers (commonly set to 1.1) | 1.1 ~ 1.2 | 1.1 |
| Peak-time load correction | Adjustment factor to mitigate load caused by a sudden surge of connections | 1.2 ~ 1.5 | 1.3 |
| Load Coupling Correction | Adjustment accounting for the workload generated by integration with other systems | 1 ~ 1.3 | 1 |
| Cluster correction | Adjustment factor for failure scenarios in a cluster environment (applied based on the number of nodes) |
|
|
| System utilization rate | Adjustment for stable operation of the system ※ Additional margin considering unexpected increases in workload, etc. | 1.3 | - |
| System target utilization | Maximum CPU utilization target based on stable system operation | 0.7 | - |
| Unit correction | Conversion factor for converting the calculation result to max‑OPS units | 24~31 | - |
| calculation formula | CPU(max-jOPS) = (concurrent user count * operations per user * base OPS correction * business usage correction * interface load correction * peak time load correction * integration load correction * cluster correction * system margin) / (system target utilization * unit correction) | ||
| Core estimation | Estimated jOPS / Performance jOPS per reference core
|
- Concurrent users
Concurrent users refer to users who simultaneously use software or a system on a network, and are typically defined based on a session (from business service request to service termination). The calculation is based on the CPU cores utilized by the existing information system server. If the hierarchical configuration of the existing server changes, we compute a correction factor taking load balancing into account. First, estimate the total number of system users. The total number of users usually refers to the total users registered in the system, but generally refers to users with access permissions. However, in the case of the web, since unspecified users can access it, estimation is required. Next, calculate the number of concurrent users as a specific percentage of the total number of users. A connected user is a user who is online and may generate transactions or operations, or may simply be connected. Finally, you can estimate the number of concurrent users by multiplying the number of connected users by a specific ratio. In a 3-tier web application, the number of users on the web server, WAS server, and DB server are closely related. The number of concurrent users on the WAS server will not exceed the number of concurrent users on the Web server, and the number of concurrent users on the DB server will not exceed the number of concurrent users on the WAS server. Taking these relationships into account, you can estimate the number of concurrent users for each layer. The table below shows estimates for the number of concurrent users in typical information systems.
| Category | concept | |
|---|---|---|
| Web server | External service | Estimate the number of connected users as roughly 1%–10% of the total user count, and estimate the number of concurrent users as roughly 5%–30% of the connected user count. |
| Web server | Large Content Service | Estimate the number of connected users as about 30% ~ 50% of the total user count, and estimate the number of concurrent users as 40% ~ 70% of the connected user count. |
| WAS server | Calculated based on a range of 50% to 100% of the estimated concurrent users of the Web server, with a typical value of 75%. | |
| DB server | Calculated within the range of 50% to 100% of the estimated concurrent users of the WAS, with a typical value of 75%. |
- Number of operations per user Operations per user is the number of business logic operations generated by a single user per second, and is assumed to be approximately 3 to 6 depending on the business type.
| Applied value | Explanation |
|---|---|
| 3 | Web service–focused tasks (referring to query‑oriented work rather than complex application logic) |
| 4 | Web service and application logic are mixed, but the work primarily focuses on the web service. |
| 5 | Web services and application logic |
| 6 | Application-logic‑centric tasks |
Basic OPS Correction The OPS figures provided by SPEC (Standard Performance Evaluation Corporation) are measured in an optimal environment and differ from actual operating environments. Therefore, you must calibrate the OPS values measured in an experimental environment to suit the complex real-world environment; this is called basic OPS calibration. The default OPS correction applies a fixed value of 3.
Business use correction There is a relative difference in the workload between Web servers and WAS servers. Taking these differences into account, different correction values are applied depending on the system type, which is referred to as business use correction. The business-use adjustment is applied differently depending on whether the calculation target is a Web server or a WAS server. Apply a correction factor of 0.7 for Web servers and a correction factor of 2 for WAS servers.
| Applied value | Explanation |
|---|---|
| 0.7 | Web server |
| 2 | WAS |
- Peak-time load correction To improve operational efficiency and obtain accurate, immediate results, the system must operate stably during peak times when workloads are most concentrated. Therefore, when estimating system scale, you should base it on peak time. Typically, a system receives approximately 20% to 50% more load during peak times compared to normal times. Considering this, adjust the system capacity by applying a weight of 1.2 to 1.5 times to the calculated capacity.
| Category | Applied value | Explanation |
|---|---|---|
| Award | 1.5 | When an excessively high load occurs at a specific time or on a specific day |
| middle | 1.4 | When excessive load occurs on a specific deadline |
| do | 1.3 | When there is a peak time daily or weekly during a specific time slot. |
| Other | 1.2 | When a peak time exists but there is no load difference. |
- Cluster Calibration Cluster calibration applies when two systems are configured as a single cluster (One-to-one form). If a failure occurs on a single server, the remaining servers must handle the entire load required by the Application. In this case, if the system lacks a redundancy margin, overload makes normal operation difficult, so an additional redundancy margin must be provided. These reserve ratios vary depending on the cluster configuration. In an Active-Active architecture, each counterpart system should be set to a 100% reserve ratio, but this is uneconomical and inefficient, so a value of 1.3 ~ 1.5 is applied. The value applied varies with the number of Nodes, applying 1.4 ~ 1.5 for a 2-Node configuration and 1.3 for a 3-Node configuration. In an Active-Standby architecture, the actual service runs on one device, while the other is used as a backup system. This is a structure where, if a failure occurs, another standby device takes over the entire functionality of the affected equipment and performs the function. You do not need to apply a separate cluster correction value in this Active-Standby architecture.
| Clustering | Node | Applied value |
|---|---|---|
| Active-Active | 2-Node | 1.4 ~ 1.5 |
| Active-Active | 3-Node | 1.3 |
| Active-Standby | Active-Standby | 1 |
- Coupling load correction
This is a correction factor that accounts for the workload generated by integration with other systems, rather than the load caused by concurrent users. Since inter-system integrations generally link to the WAS server rather than the Web server, we apply a correction value of 1 to the Web server. On the other hand, for WAS servers, a separate correction coefficient can be applied as follows, depending on the frequency of integration transactions or processing complexity.
| Category | Applied value | Explanation |
|---|---|---|
| Web server | 1 | In the case of a web server |
| WAS | 1 | When there are no linked tasks among all WAS operations (0%) |
| WAS | 1.1 | If the integrated tasks among all WAS operations consist only of simple query tasks (10% of the total load) |
| WAS | 1.2 | If the only linked task among the total WAS workload is internal update work (20% of the overall load) |
| WAS | 1.3 | If the overall WAS workload includes linked tasks for internal/external updates (30% of the total load). |
System Margin System margin is a correction value for stable operation even in the event of unexpected workload increases or abnormal traffic. On-premise systems generally apply an additional 30% margin, i.e., a correction factor of 1.3.
System Target Utilization Rate Generally, information systems are designed based on a target utilization rate of 100%, but to ensure stable system operation, the actual utilization rate is managed so that it does not reach 100%. The maximum CPU utilization for stable system operation is called the system target utilization, and typically a maximum of 70% (factor 0.7) is applied.
Unit correction
Unit correction is a correction value applied based on the server’s form. When applying the composite type of max-jOPS, you can apply 29 for X86 servers, 31 for Unix servers, and 30 as the general value. When applying max-jOPS in MultiJVM mode, you can apply 24 for X86 servers, 26 for Unix servers, and 25 as the general value.
| Category | Applied value | Explanation |
|---|---|---|
| Composite SPECjbb2015 | 29 | X86 server |
| Composite SPECjbb2015 | 30 | Server type unspecified (default) |
| Composite SPECjbb2015 | 31 | Unix server |
| MultiJVM SPECjbb2015 | 24 | X86 server |
| MultiJVM SPECjbb2015 | 25 | Server type unspecified (default) |
| MultiJVM SPECjbb2015 | 26 | Unix server |
DB Server CPU Sizing Based on Formula Calculation
Now, calculate the DB server’s CPU capacity using the formula.
Unlike Web/WAS servers, DB servers derive and compute tpmC based on the number of transactions per minute.
| Calculation items | Basis for calculation | Apply | Calculation items |
|---|---|---|---|
| Transactions per minute | Total estimated per‑minute transaction count on the assessed servers | - |
|
| Default tpmC calibration | Correction factor for adapting the tpmC values measured in the experimental environment to the complex real-world environment | - | 5 |
| Peak-time load correction | A correction factor that accounts for peak times to ensure the system operates smoothly during periods of heavy workload. | 1.2 ~1.5 | 1.3 |
| Database size adjustment | Adjustment factor considering the number of records in the database table and the total database volume | 1.5 ~ 2.0 | 1.7 |
| Application structure correction | Correction factor considering the performance differences based on the application’s architecture and required response time. | 1.1 ~ 1.5 | 1.2 |
| Application Load Compensation | Adjustment factor that accounts for simultaneous batch jobs during peak times of online processing. | 1.3 ~ 2.2 | 1.7 |
| Linkage Load compensation | Correction factor that accounts for the workload generated by integration with other systems | 1 ~ 1.2 | 1 |
| Cluster calibration | Adjustment factor for failure scenarios in a cluster environment |
|
|
| System utilization rate | Additional buffer to account for unexpected workload increases, etc. | 1.3 | |
| System target utilization | Maximum CPU utilization target based on stable system operation | 0.7 | |
| calculation formula | CPU (tpmC unit) = (transactions per minute * base tpmC correction * peak‑time load correction * DB size correction * application architecture correction * application load correction * integration load correction * cluster correction * system margin) / system target utilization | ||
| Core calculation | Estimated tpmC / Performance per reference core tpmC
|
Transactions per minute In a client/server environment, work typically occurs on a transaction basis. Therefore, in an OLTP (Online Transaction Processing) environment, estimating the number of transactions generated per application is the key criterion for determining system scale. There are three methods for calculating the number of transactions per minute: investigating existing system transactions, estimating based on the number of concurrent users, and estimating based on the number of clients.
Investigation of Existing System Transactions
This method involves analyzing transactions for the production system on an annual or monthly basis and converting them into transactions per minute for use. Generally, since existing systems already possess annual and monthly transaction data regarding Application usage, it is effective to start the calculation based on this data by considering the days and times of transaction occurrence. At this point, an analysis of the occurrence pattern should also be performed, such as whether transactions occur daily throughout a month, only during approximately 20 days excluding weekends, or for 8 hours versus 24 hours each day.Concurrent User Usage
If previously investigated transactions do not exist, such as when introducing a new system, use a concurrent user-based estimation method. In other words, this applies when it is difficult to estimate the expectations for the system and the specific details of the application to be developed in the future. To apply this method, first estimate the total number of users to calculate the number of concurrent users. Estimate the number of transactions per minute that a single concurrent user is expected to generate, considering the anticipated future task types and characteristics. This value is calculated as “number of tasks × transactions per task”, and ultimately the transactions per minute = concurrent users × transactions per user can be derived.Client count usage This is a method that can be used when only the number of clients is secured. In this case, you must consider how the client connects to the server and requests tasks, but this will be addressed in a later refinement phase. By default, we assume that all clients are on the same LAN. Then, after estimating the number of concurrent clients from the total number of clients, calculate the number of transactions per minute according to the previously explained method for concurrent users.
Basic tpmC correction The tpmC values provided by TPC are measured in an optimal environment and differ from actual operating environments. Therefore, the tpmC values measured in the experimental environment must be calibrated to suit the complex real-world environment; this is called the basic tpmC calibration. The default tpmC correction value applies a fixed value of 5.
Peak-time load correction To improve work efficiency and obtain accurate, immediate results, the system must operate stably during peak times when work is most concentrated. Therefore, when estimating system scale, you should base it on peak time. Generally, the system receives approximately 20% to 50% more load during peak times compared to normal times. Considering this, adjust the system capacity by applying a weight of 1.2 to 1.5.
| Category | applied value | Explanation |
|---|---|---|
| Award | 1.5 | When an excessively high load occurs at a specific time or on a specific day |
| middle | 1.4 | When excessive load occurs on a specific deadline |
| do | 1.3 | When a peak time occurs daily or weekly during a specific time slot. |
| Other | 1.2 | When a peak time exists but there is no load difference. |
- Database size correction The correction factor based on database size is determined by considering the record count of the largest table in the DB and the total DB volume. For databases of the same size, the one with more records is assigned a larger weight. If the record counts are the same, the database with the larger volume is assigned the larger weight. However, since applying weights is difficult if an accurate value is not derived based on a detailed analysis of the actual business system, the general value of 1.7 is applied.
| Record count \ DB size | ~ 8 | ~ 32 | ~ 128 | ~ 256 | 256 or greater |
|---|---|---|---|---|---|
| less than 50 Gbyte | 1.50 | 1.55 | 1.60 | 1.65 | 1.70 |
| Less than 500 Gbyte | 1.60 | 1.65 | 1.70 | 1.75 | 1.80 |
| under 1 Tbyte | 1.70 | 1.75 | 1.80 | 1.85 | 1.90 |
| less than 2Tbyte | 1.80 | 1.85 | 1.90 | 1.95 | 1.95 |
| 2 TB or more | 1.85 | 1.90 | 1.90 | 1.95 | 2.00 |
- Application structure correction Application structure correction is a correction value that accounts for performance differences based on Application response time. Response time refers to the user’s service response time, not the server’s response time. The applied values are as shown in the table below, and they are not applied if they exceed 5 seconds.
| Response time | 1 second | 2 seconds | 3 seconds | 4 seconds |
|---|---|---|---|---|
| Applied value | 1.50 | 1.35 | 1.20 | 1.10 |
- Application Load Correction Application load correction is a correction value considered to account for cases where batch jobs run simultaneously during peak times for online operations. When additional tasks are performed beyond the designated online work (such as batch tasks like reporting or backup, or when using external systems), the required processing capacity must be adjusted accordingly. Therefore, this application load correction is applied according to the proportion of batch workload occurrences. As shown in the table below, when there are many additional tasks such as batch jobs, the maximum value of 2.2 can be applied; when there are no additional tasks such as batch jobs besides online transactions, the minimum value of 1.3 can be applied, and a typical value of 1.7 can be used.
| Category | applied value | Explanation |
|---|---|---|
| Award | 1.9 ~ 2.2 | When many additional tasks such as batch jobs are performed. |
| middle | 1.6 ~ 1.8 | When certain batch operations are performed within an online transaction |
| do | 1.3 ~ 1.5 | If there are no additional tasks, such as batch processing, besides online transactions. |
- Connected load correction This is a correction factor that accounts for the workload generated by integration with other systems, rather than by the number of concurrent users. The DB server can be applied differently depending on the level of the integration workload, such as transactions.
| applied value | Explanation |
|---|---|
| 1 | When there is no linked task among the entire DB server workload (unreflected) |
| 1.1 | When the DB server integration task involves simple queries and data update integration (10% of total load) |
| 1.2 | When the DB server integration task involves large‑scale queries and data update integration (20% of total load) |
- Cluster Calibration Cluster calibration is applied when two systems are configured as a single cluster (One-to-one configuration). If a failure occurs on a single server, the remaining servers must handle the entire load required by the Application. In this case, if the system lacks a margin, normal operation becomes difficult due to overload, so you must provide an additional margin. These reserve ratios vary depending on the cluster configuration. In an Active-Active architecture, 100% of each peer system must be reserved for redundancy, but since this is uneconomical and inefficient, a value of 1.3 to 1.5 is applied. The value applied depends on the number of Nodes: apply 1.4 to 1.5 for a 2-Node configuration, and 1.3 for a 3-Node configuration. In an Active-Standby architecture, one device handles the actual service while the other serves as a backup system. In the event of a failure, the entire functionality of the affected equipment is transferred to a standby device, where the function is performed. You do not need to apply a separate cluster correction value in this Active-Standby structure.
| Clustering-Node | Applied value |
|---|---|
| Active-Active - 2 Node | 1.3 ~ 1.5 |
| Active-Active - 3 Node | 1.3 |
| Active-Standby | 1 |
System Margin
System margin is a correction value for stable operation, even during unexpected workload increases or abnormal traffic situations. On-premise systems generally apply an additional 30% margin, i.e., a correction factor of 1.3.System Target Utilization Rate
Generally, information systems are designed based on a target system utilization rate of 100%, but to ensure stable operation, the actual utilization rate is managed so that it does not reach 100%. Thus, the maximum CPU utilization for stable system operation is referred to as the system target utilization, and typically, a maximum of 70% (coefficient 0.7) is applied.
CPU Sizing via Reference Method
The following is CPU sizing using the reference method.
The reference method calculates the capacity of the system to be built based on the resources of an existing business system.
The method for calculating capacity using the reference method is as shown in the table below.
| Calculation items | Content | Scope | default value |
|---|---|---|---|
| Existing CPU core count | Number of cores of the target server in the existing information system (CPU * cores per CPU) | Calculated value | |
| Layered structure | Apply hierarchical correction factor relative to the existing CPU core count | 0.5~3.0 | |
| Redundant configuration | Apply redundancy configuration correction factor | 0.7~2.0 | |
| Server type | Apply correction factor according to the existing x86 server (physical/virtualized)
| ||
| CPU average utilization | Average CPU utilization of the existing information system (apply 0.5 when it is 50%) | 1%~100% | |
| CPU idle utilization | Adjustment factor for stable system operation | 1.3 | |
| calculation formula | Capacity estimate = number of existing CPU cores * layer configuration correction factor * redundancy configuration correction factor * server type correction factor * average CPU utilization * spare utilization correction factor | ||
| Core calculation | Existing CPU count(4) * No change in tier configuration(1) * A-A redundancy configuration(0.7) * Server type physical → virtual(1.2) * CPU average utilization 30%(0.3) * Spare utilization 30%(1.3) = approximately 1.3 cores
|
Number of existing CPU cores The calculation is based on the CPU cores utilized by the existing information system server. This reference method calculates based on the number of CPUs and the number of cores per CPU, without considering the CPU’s actual performance.
Hierarchy Configuration If the hierarchical configuration of the existing server changes, we calculate a correction factor taking load balancing into account. Calculate the correction values respectively when the number of layers increases or decreases.
| Hierarchy change | applied value | Content |
|---|---|---|
| 1→2, 2→3 | 0.7 | (Web/WAS/DB)→(Web),(WAS/DB) or (Web/WAS),(DB) (Web),(WAS/DB) or (Web/WAS),(DB)→(Web),(WAS),(DB) |
| 1→3 | 0.5 | (Web/WAS/DB)→(Web),(WAS),(DB) |
| 2→1, 3→2 | 2.0 | (Web),(WAS/DB) or (Web/WAS),(DB)→(Web/WAS/DB) (Web),(WAS),(DB)→(Web),(WAS/DB) or (Web/WAS),(DB) |
| 3→1 | 3.0 | (Web),(WAS),(DB)→(Web/WAS/DB) |
- Redundant Configuration
If the hierarchical configuration of the existing server changes, we calculate a correction factor taking load balancing into account. Calculate the correction value respectively when layers increase or decrease.
| Hierarchy change | applied value | Content |
|---|---|---|
| 1→2 | 0.7 | Active–Active redundancy configuration correction factor |
| 1→2 | 1.0 | Active–Standby redundancy configuration correction factor: none |
| 2→1 | 2.0 | Change from an Active–Active redundant configuration to a single configuration |
- Server Type Apply a correction factor, taking into account whether the existing information system server is a physical or virtual server. Apply a correction factor to account for virtualization overhead when migrating from a physical server to the cloud.
| existing server | applied value | Content |
|---|---|---|
| physical server | 1.2 | Apply the physical‑virtual conversion correction factor due to cloud virtualization. |
| virtual server | 1.0 | Virtualization – no correction factor applied because it is a virtualization transition. |
Average CPU Usage Measures the computing usage of the existing server by considering the average CPU utilization of the existing information system server.
CPU idle utilization Apply a correction factor considering the target CPU utilization when configuring a new server. For example, if the target average CPU utilization is 70%, apply a correction factor of 1.3, considering a 30% buffer.
Server Memory Sizing Based on Formula Calculation
The method for estimating memory size based on formula calculation is much simpler compared to the CPU.
Depending on the system being built, various approaches such as using a programming language or threads are employed to reduce memory usage.
Depending on these strategies, sizing methods vary slightly, and the number of processes running on the system and the amount of memory they use significantly affect memory sizing.
However, this guideline estimates memory size based on the purpose and structure of a general system, without considering programming languages, thread usage, or the memory configuration characteristics of specific systems.
| Calculation items | Basis for calculation | Scope | default value |
|---|---|---|---|
| System area | Space required for OS, DBMS engine, middleware engine, and other utilities | Calculated value | |
| Memory required per user | Memory per user required for using the application, middleware, and DBMS | 1MB~3MB | 2MB |
| Number of concurrent users | Users who simultaneously use software or systems over a network | Calculated value | |
| OS buffer cache correction | Correction factor for a memory location that temporarily stores a certain amount of data to improve processing speed | 1.1~1.3 | 1.15 |
| Application required memory | Cache areas used by middleware such as the DBMS shared memory and the WAS heap size | Calculated value | |
| System utilization rate | Adjustment factor for stable system operation | 1.3 | |
| calculation formula | Memory (MB) = {System area + (Memory required per user * Number of users) + Application required memory} * Buffer cache adjustment * System margin | ||
| Memory estimation example | {System area 256MB + (memory required per user 64KB * number of users 3,000) + Application required memory 300MB} * Buffer cache correction 1.15 * System safety margin 30% (1.3)
|
System Area The system area refers to the memory space required for the execution of running software (operating systems, network daemons, database engines, middleware, utilities, etc.), and is calculated based on the memory required by each software when running. In particular, this area must be applied differently depending on the number of licenses for the software, such as databases, and is generally calculated by reflecting the required memory recommended by each software vendor.
Memory required per user Memory required per user refers to the memory capacity required per user depending on the usage of Applications, Middleware, DBMS, etc. This value is calculated considering various factors. For example, the required memory per user may vary depending on the application implementation method, middleware application method, user process I/O structure, DBMS vendor’s architecture, etc. However, if calculation is impossible, you may arbitrarily apply a value between 1MB and 3MB.
Concurrent users Concurrent users refer to users who use software or a system simultaneously on a network. The number of concurrent users is not calculated separately from a memory perspective; instead, the CPU-based concurrent user estimate from the previous step is applied.
OS Buffer Cache Correction To improve processing speed, computers collect a certain amount of data and process it all at once, and the storage location where this data is collected is called a buffer cache. The correction value considering this is called the OS buffer cache correction. OS buffer cache correction can use values from 1.1 to 1.3, and the default value is 1.15.
Application Required Memory Application required memory refers to the cache area used by middleware, such as the DBMS shared memory and the WAS heap size. The size of this memory is determined based on the requirements of each middleware, such as DBMS and WAS.
System Margin This is a correction value for stable system operation in response to unexpected increases in workload. For on-premise systems, we typically consider an additional 30% buffer (correction factor 1.3).
Container Application Review
Containers are one of the most widely used tools for application modernization.
When you package the application and runtime into a container, you can deploy to any operating system platform, and by providing platform‑independent capabilities, you simplify software development, testing, and deployment processes and facilitate automation.
Containers are effective for building complex multi-tier applications.
For example, if you need to run an Application server, a database, and a message queue together, you can run each as a separate container image in parallel and configure communication between them.
Even if library versions differ across layers, you can run them on the same computing server without conflicts using containers.
Kubernetes is a platform that can efficiently manage and control multiple containers in production environments.
Kubernetes provides horizontal scaling capabilities and blue-green deployment features that minimize downtime.
Additionally, you can distribute user traffic load across containers and manage storage shared by various containers.
GPU Application Review
GPU Server allows you to configure a virtual server by selecting the GPU card type and quantity based on the project’s purpose and scale, and provides high-performance GPU servers at the physical server level using the Pass-through method.
The specifications of the provided NVIDIA GPU are as follows, and RHEL and Ubuntu are provided as operating systems.
| Category | V100 Type | A100 Type | H100 SXM |
|---|---|---|---|
| Service Delivery Method | Pass-through | Pass-through | Pass-through |
| GPU Performance | NVIDIA Volta | NVIDIA Ampere | NVIDIA Hopper |
| 32GB | 80GB | 80GB |
| 21.1 billion 12nm TSMC | 54 billion 7nm TSMC | 80 billion 4N TSMC |
| 125 TFLOPs | 312 TFLOPs | 1,979 TFLOPs |
| 900 GB/sec | 2,000 GB/sec | 3.35 TB/sec HBM3 |
| 5,120 Cores | 6,912 Cores | 16,896 Cores |
| Tensor Cores | 640 (1st Generation) | 1,024 (3rd generation) | 528 (4th generation) |
| NVLink performance | NVLink 2 | NVLink 3 | NVLink 4 |
| 300 GB/s | 600 GB/s | 900 GB/s |
| 25 Gbps | 50 Gbps | 25 Gbps (x18) |
| NVSwitch Performance | NVSwitch 2 | NVSwitch 3 | |
| 600 GB/s | 900 GB/s | |
| 9.6 TB/s | 7.2TB/s | |
| Linked Storage | Block Storage - SSD | Block Storage - SSD | Block Storage - SSD |
GPU servers equipped with Nvidia V100, A100, H100 are provided as server types with 1/2/4/8 GPUs and NVSwitch and NVLink installed on virtualized computing resources.
The CPU:Memory combinations for the provided server types are 1:8 for V100, 1:15 for A100, and 1:20 for H100.
GPU Servers are suitable for workloads requiring fast computation speeds, such as AI model experimentation, prediction, and inference, and allow you to flexibly select and utilize resources with optimized performance based on the type and scale of your tasks.
