The page has been translated by Gen AI.

컴퓨팅 설계

컴퓨팅 설계

Computing Services, Server Types, and Sizing

Choosing the Right Compute Service for Your Workload

The computing service specifications provided by Samsung Cloud Platform are as follows.

ProductTypeCPUMemoryOptionOption
Virtual ServerStandard1/2/4/6/8/10 vCore2~160GBMax Network Bandwidth10Gbps
Virtual ServerStandard12/14/16 vCore24~256GBMax Network Bandwidth12.5Gbps
Virtual ServerHigh Capacity24/32/48/64/72/96/128 vCore48~1,536GBMax Network Bandwidth25Gbps
GPU ServerA100(80G)16/32/64/128 vCore240~1,920GBGPU1~8
GPU ServerH100(80G)12/24/48/96 vCore240~1,920GBGPU1~8
Bare Metal Sever(3thGen)16/32/64/96/128 vCore64~2,048GBPhysical CPU8~64
표. Virtual Server 서버 타입

Note: You can check the latest server types by visiting the website below. Virtual Server : https://cloud.samsungsds.com/serviceportal/services/compute/virtualServer.html GPU Server : https://cloud.samsungsds.com/serviceportal/services/compute/gpuServer.html Bare Metal Server : https://cloud.samsungsds.com/serviceportal/services/compute/baremetal.html

Conceptual diagram
Figure. Virtual Server, GPU Server, Bare Metal Server
  • Virtual Server Virtual Server provides a Standard (s1) type of up to 16 vCores and a High Capacity (h2) type of 24 vCores or more. The Standard type uses an Intel Ice Lake CPU, and 1vCore/2GB is the minimum specification. From 2 vCore up to 16 vCore, they are offered in CPU:Memory ratios of 1:2, 1:4, 1:8, and 1:16. The High Capacity type uses Intel Sapphire Rapids CPUs and offers CPU:Memory combinations in 1:2, 1:4, 1:8, and 1:12 ratios, ranging from 24 vCore to 128 vCore. The operating systems include RHEL, Ubuntu, Alma, Rocky, Oracle Linux, and Windows Server, and you can configure Kubernetes images, Data Service Console images, and so on. Virtual Server can be utilized for various purposes, such as development, testing, and Application execution, depending on the user’s computing needs.

  • Bare Metal Server A Bare Metal Server is a high‑performance cloud computing service that does not use virtualization technology and allocates physically isolated computing resources such as CPU and memory exclusively. The 3rd generation service using Intel Sapphire Rapids is currently available. The CPU:Memory combinations for the provided server types are available in 1:4, 1:8, and 1:16. The default Internal Disk for the OS is 480GB2 for 16 vCore, 960GB2 for 32 vCore, and 1.92TB*2 for 96/128 vCore. Bare Metal Servers are suitable for workloads requiring high capacity and high performance, such as real-time systems, HPC (High Performance Computing), and servers requiring heavy I/O usage. Additionally, you can leverage the Multi-Attach feature to configure databases that require Active-Active high availability.

Server Sizing

After selecting a computing service suitable for the workload, you must determine the server specifications and quantity based on availability and performance requirements.

While determining server specifications and quantities was a critical process in on-premises environments, in cloud environments, it becomes a flexible task that can be changed at any time.

This is because adjustments can be made later, even if there is a difference between the initially set specifications and the actual required specifications.

Nevertheless, server sizing is important because we must calculate the workload operating cost (monthly fee) in the cloud and, based on that, derive the TCO (Total Cost of Ownership) compared to on‑premises deployment.

To estimate the hardware scale of an information system, the following three methodologies can be considered.

CategoryConceptAdvantageDisadvantage
Formula calculation methodMethod for calculating capacity values based on factors such as user count for sizing estimation, and applying correction factors.It can clearly present the basis for scale estimation and can be calculated more simply than other methods.If the correction factor is incorrect, a large discrepancy from the desired value occurs, and providing accurate supporting data for the correction factor is difficult.
Reference MethodDepending on the workload (number of users, DB size), estimate a comparable system size by comparing approximate scales using the baseline data.Since it can be compared with the existing implemented business system, a relatively safe scale estimation is possible.Because it relies on comparison rather than a calculative approach, the justification is weak.
simulation methodModel the workload for the target task, simulate it, and estimate its scale.You can obtain relatively accurate valuesIt is time-consuming and costly.
표. 서버 규모 산정법

Calculation formulas and reference methods estimate the resource usage of servers deployed on the cloud by extracting various metrics.

Generally, cloud capacity planning identifies the capacity balance point by adjusting through simulation methods or operations.

However, sizing is often required for reasons such as cost estimation or proposals.

The formula-based calculation method can provide objective capacity design criteria because it calculates server capacity using various metrics.

Web/WAS CPU Sizing Based on Formula Calculation

First, calculate the CPU capacity of the Web/WAS server using the formula.

Calculation itemsCalculation basisScopedefault value
Number of concurrent usersUsers who simultaneously use software or systems over a network-estimated value
Number of operations per userNumber of business logic operations generated per second by a single user3 ~ 65
Basic OPS correctionCorrection factor for adapting the OPS (Operations Per Second) measured in the test environment to a complex real-world environment (default OPS correction factor of 3 applied)-3
Business-use adjustmentCorrection factor based on the type of target system (0.7: web server, 2: WAS server)
  • Web: 0.7
  • WAS: 2
Interface load compensationCorrection factor that accounts for the load generated by the interface when communicating between servers (commonly set to 1.1)1.1 ~ 1.21.1
Peak-time load correctionAdjustment factor to mitigate load caused by a sudden surge of connections1.2 ~ 1.51.3
Load Coupling CorrectionAdjustment accounting for the workload generated by integration with other systems1 ~ 1.31
Cluster correctionAdjustment factor for failure scenarios in a cluster environment (applied based on the number of nodes)
  • 2Node : 1.4~1.5
  • 3Node : 1.3
  • 2Node : 1.4~1.5
  • 3Node : 1.3
System utilization rateAdjustment for stable operation of the system
※ Additional margin considering unexpected increases in workload, etc.
1.3-
System target utilizationMaximum CPU utilization target based on stable system operation0.7-
Unit correctionConversion factor for converting the calculation result to max‑OPS units24~31-
calculation formulaCPU(max-jOPS) = (concurrent user count * operations per user * base OPS correction * business usage correction * interface load correction * peak time load correction * integration load correction * cluster correction * system margin) / (system target utilization * unit correction)
Core estimationEstimated jOPS / Performance jOPS per reference core
  • Performance jOPS per reference core varies between 1,000 and 3,000 depending on the hardware
  • If, according to the above calculation, jOPS is 5,000 and performance jOPS per reference core is 1,500, the estimated cores are 5,000/1,500 ≈ 3.3 cores, and when selecting a Virtual Server type, a 4‑core instance is chosen
표. 수식 계산법에 의한 Web/WAS 서버 CPU 사이징
  • Concurrent users
    Concurrent users refer to users who simultaneously use software or a system on a network, and are typically defined based on a session (from business service request to service termination). The calculation is based on the CPU cores utilized by the existing information system server. If the hierarchical configuration of the existing server changes, we compute a correction factor taking load balancing into account. First, estimate the total number of system users. The total number of users usually refers to the total users registered in the system, but generally refers to users with access permissions. However, in the case of the web, since unspecified users can access it, estimation is required. Next, calculate the number of concurrent users as a specific percentage of the total number of users. A connected user is a user who is online and may generate transactions or operations, or may simply be connected. Finally, you can estimate the number of concurrent users by multiplying the number of connected users by a specific ratio. In a 3-tier web application, the number of users on the web server, WAS server, and DB server are closely related. The number of concurrent users on the WAS server will not exceed the number of concurrent users on the Web server, and the number of concurrent users on the DB server will not exceed the number of concurrent users on the WAS server. Taking these relationships into account, you can estimate the number of concurrent users for each layer. The table below shows estimates for the number of concurrent users in typical information systems.
Categoryconcept
Web serverExternal serviceEstimate the number of connected users as roughly 1%–10% of the total user count, and estimate the number of concurrent users as roughly 5%–30% of the connected user count.
Web serverLarge Content ServiceEstimate the number of connected users as about 30% ~ 50% of the total user count, and estimate the number of concurrent users as 40% ~ 70% of the connected user count.
WAS serverCalculated based on a range of 50% to 100% of the estimated concurrent users of the Web server, with a typical value of 75%.
DB serverCalculated within the range of 50% to 100% of the estimated concurrent users of the WAS, with a typical value of 75%.
표. 동시 사용자 수 산정
  • Number of operations per user Operations per user is the number of business logic operations generated by a single user per second, and is assumed to be approximately 3 to 6 depending on the business type.
Applied valueExplanation
3Web service–focused tasks (referring to query‑oriented work rather than complex application logic)
4Web service and application logic are mixed, but the work primarily focuses on the web service.
5Web services and application logic
6Application-logic‑centric tasks
표. 사용자별 오퍼레이션 수 산정
  • Basic OPS Correction The OPS figures provided by SPEC (Standard Performance Evaluation Corporation) are measured in an optimal environment and differ from actual operating environments. Therefore, you must calibrate the OPS values measured in an experimental environment to suit the complex real-world environment; this is called basic OPS calibration. The default OPS correction applies a fixed value of 3.

  • Business use correction There is a relative difference in the workload between Web servers and WAS servers. Taking these differences into account, different correction values are applied depending on the system type, which is referred to as business use correction. The business-use adjustment is applied differently depending on whether the calculation target is a Web server or a WAS server. Apply a correction factor of 0.7 for Web servers and a correction factor of 2 for WAS servers.

Applied valueExplanation
0.7Web server
2WAS
표. 업무 용도 보정
  • Peak-time load correction To improve operational efficiency and obtain accurate, immediate results, the system must operate stably during peak times when workloads are most concentrated. Therefore, when estimating system scale, you should base it on peak time. Typically, a system receives approximately 20% to 50% more load during peak times compared to normal times. Considering this, adjust the system capacity by applying a weight of 1.2 to 1.5 times to the calculated capacity.
CategoryApplied valueExplanation
Award1.5When an excessively high load occurs at a specific time or on a specific day
middle1.4When excessive load occurs on a specific deadline
do1.3When there is a peak time daily or weekly during a specific time slot.
Other1.2When a peak time exists but there is no load difference.
표. 피크 타임 부하 보정
  • Cluster Calibration Cluster calibration applies when two systems are configured as a single cluster (One-to-one form). If a failure occurs on a single server, the remaining servers must handle the entire load required by the Application. In this case, if the system lacks a redundancy margin, overload makes normal operation difficult, so an additional redundancy margin must be provided. These reserve ratios vary depending on the cluster configuration. In an Active-Active architecture, each counterpart system should be set to a 100% reserve ratio, but this is uneconomical and inefficient, so a value of 1.3 ~ 1.5 is applied. The value applied varies with the number of Nodes, applying 1.4 ~ 1.5 for a 2-Node configuration and 1.3 for a 3-Node configuration. In an Active-Standby architecture, the actual service runs on one device, while the other is used as a backup system. This is a structure where, if a failure occurs, another standby device takes over the entire functionality of the affected equipment and performs the function. You do not need to apply a separate cluster correction value in this Active-Standby architecture.
ClusteringNodeApplied value
Active-Active2-Node1.4 ~ 1.5
Active-Active3-Node1.3
Active-StandbyActive-Standby1
표. 클러스터 보정
  • Coupling load correction
    This is a correction factor that accounts for the workload generated by integration with other systems, rather than the load caused by concurrent users. Since inter-system integrations generally link to the WAS server rather than the Web server, we apply a correction value of 1 to the Web server. On the other hand, for WAS servers, a separate correction coefficient can be applied as follows, depending on the frequency of integration transactions or processing complexity.
CategoryApplied valueExplanation
Web server1In the case of a web server
WAS1When there are no linked tasks among all WAS operations (0%)
WAS1.1If the integrated tasks among all WAS operations consist only of simple query tasks (10% of the total load)
WAS1.2If the only linked task among the total WAS workload is internal update work (20% of the overall load)
WAS1.3If the overall WAS workload includes linked tasks for internal/external updates (30% of the total load).
표. 연계 부하 보정
  • System Margin System margin is a correction value for stable operation even in the event of unexpected workload increases or abnormal traffic. On-premise systems generally apply an additional 30% margin, i.e., a correction factor of 1.3.

  • System Target Utilization Rate Generally, information systems are designed based on a target utilization rate of 100%, but to ensure stable system operation, the actual utilization rate is managed so that it does not reach 100%. The maximum CPU utilization for stable system operation is called the system target utilization, and typically a maximum of 70% (factor 0.7) is applied.

  • Unit correction
    Unit correction is a correction value applied based on the server’s form. When applying the composite type of max-jOPS, you can apply 29 for X86 servers, 31 for Unix servers, and 30 as the general value. When applying max-jOPS in MultiJVM mode, you can apply 24 for X86 servers, 26 for Unix servers, and 25 as the general value.

CategoryApplied valueExplanation
Composite SPECjbb201529X86 server
Composite SPECjbb201530Server type unspecified (default)
Composite SPECjbb201531Unix server
MultiJVM SPECjbb201524X86 server
MultiJVM SPECjbb201525Server type unspecified (default)
MultiJVM SPECjbb201526Unix server
표. 단위 보정

DB Server CPU Sizing Based on Formula Calculation

Now, calculate the DB server’s CPU capacity using the formula.

Unlike Web/WAS servers, DB servers derive and compute tpmC based on the number of transactions per minute.

Calculation itemsBasis for calculationApplyCalculation items
Transactions per minuteTotal estimated per‑minute transaction count on the assessed servers-
  • Number of tasks: 2
  • Transactions per task: 4~6
Default tpmC calibrationCorrection factor for adapting the tpmC values measured in the experimental environment to the complex real-world environment-5
Peak-time load correctionA correction factor that accounts for peak times to ensure the system operates smoothly during periods of heavy workload.1.2 ~1.51.3
Database size adjustmentAdjustment factor considering the number of records in the database table and the total database volume1.5 ~ 2.01.7
Application structure correctionCorrection factor considering the performance differences based on the application’s architecture and required response time.1.1 ~ 1.51.2
Application Load CompensationAdjustment factor that accounts for simultaneous batch jobs during peak times of online processing.1.3 ~ 2.21.7
Linkage
Load compensation
Correction factor that accounts for the workload generated by integration with other systems1 ~ 1.21
Cluster calibrationAdjustment factor for failure scenarios in a cluster environment
  • 2 Node : 1.4~1.5
  • 3 Node : 1.3
  • 2 Node : 1.4~1.5
  • 3 Node : 1.3
System utilization rateAdditional buffer to account for unexpected workload increases, etc.1.3
System target utilizationMaximum CPU utilization target based on stable system operation0.7
calculation formulaCPU (tpmC unit) = (transactions per minute * base tpmC correction * peak‑time load correction * DB size correction * application architecture correction * application load correction * integration load correction * cluster correction * system margin) / system target utilization
Core calculationEstimated tpmC / Performance per reference core tpmC
  • The performance per reference core tpmC varies from 70,000 to 400,000 depending on the hardware.
  • If, according to the above calculation, tpmC is 800,000 and the performance per reference core tpmC is 190,000, the estimated cores are 700,000/190,000 ≈ 3.7 cores, and a 4‑core server type is selected.
표. 수식 계산법에 의한 DB 서버 CPU 사이징
  • Transactions per minute In a client/server environment, work typically occurs on a transaction basis. Therefore, in an OLTP (Online Transaction Processing) environment, estimating the number of transactions generated per application is the key criterion for determining system scale. There are three methods for calculating the number of transactions per minute: investigating existing system transactions, estimating based on the number of concurrent users, and estimating based on the number of clients.

    Investigation of Existing System Transactions
    This method involves analyzing transactions for the production system on an annual or monthly basis and converting them into transactions per minute for use. Generally, since existing systems already possess annual and monthly transaction data regarding Application usage, it is effective to start the calculation based on this data by considering the days and times of transaction occurrence. At this point, an analysis of the occurrence pattern should also be performed, such as whether transactions occur daily throughout a month, only during approximately 20 days excluding weekends, or for 8 hours versus 24 hours each day.

    Concurrent User Usage
    If previously investigated transactions do not exist, such as when introducing a new system, use a concurrent user-based estimation method. In other words, this applies when it is difficult to estimate the expectations for the system and the specific details of the application to be developed in the future. To apply this method, first estimate the total number of users to calculate the number of concurrent users. Estimate the number of transactions per minute that a single concurrent user is expected to generate, considering the anticipated future task types and characteristics. This value is calculated as “number of tasks × transactions per task”, and ultimately the transactions per minute = concurrent users × transactions per user can be derived.

    Client count usage This is a method that can be used when only the number of clients is secured. In this case, you must consider how the client connects to the server and requests tasks, but this will be addressed in a later refinement phase. By default, we assume that all clients are on the same LAN. Then, after estimating the number of concurrent clients from the total number of clients, calculate the number of transactions per minute according to the previously explained method for concurrent users.

  • Basic tpmC correction The tpmC values provided by TPC are measured in an optimal environment and differ from actual operating environments. Therefore, the tpmC values measured in the experimental environment must be calibrated to suit the complex real-world environment; this is called the basic tpmC calibration. The default tpmC correction value applies a fixed value of 5.

  • Peak-time load correction To improve work efficiency and obtain accurate, immediate results, the system must operate stably during peak times when work is most concentrated. Therefore, when estimating system scale, you should base it on peak time. Generally, the system receives approximately 20% to 50% more load during peak times compared to normal times. Considering this, adjust the system capacity by applying a weight of 1.2 to 1.5.

Categoryapplied valueExplanation
Award1.5When an excessively high load occurs at a specific time or on a specific day
middle1.4When excessive load occurs on a specific deadline
do1.3When a peak time occurs daily or weekly during a specific time slot.
Other1.2When a peak time exists but there is no load difference.
표. 피크 타임 부하 보정
  • Database size correction The correction factor based on database size is determined by considering the record count of the largest table in the DB and the total DB volume. For databases of the same size, the one with more records is assigned a larger weight. If the record counts are the same, the database with the larger volume is assigned the larger weight. However, since applying weights is difficult if an accurate value is not derived based on a detailed analysis of the actual business system, the general value of 1.7 is applied.
Record count \ DB size~ 8~ 32~ 128~ 256256 or greater
less than 50 Gbyte1.501.551.601.651.70
Less than 500 Gbyte1.601.651.701.751.80
under 1 Tbyte1.701.751.801.851.90
less than 2Tbyte1.801.851.901.951.95
2 TB or more1.851.901.901.952.00
표. 데이터베이스 크기 보정
  • Application structure correction Application structure correction is a correction value that accounts for performance differences based on Application response time. Response time refers to the user’s service response time, not the server’s response time. The applied values are as shown in the table below, and they are not applied if they exceed 5 seconds.
Response time1 second2 seconds3 seconds4 seconds
Applied value1.501.351.201.10
표. Application 구조 보정
  • Application Load Correction Application load correction is a correction value considered to account for cases where batch jobs run simultaneously during peak times for online operations. When additional tasks are performed beyond the designated online work (such as batch tasks like reporting or backup, or when using external systems), the required processing capacity must be adjusted accordingly. Therefore, this application load correction is applied according to the proportion of batch workload occurrences. As shown in the table below, when there are many additional tasks such as batch jobs, the maximum value of 2.2 can be applied; when there are no additional tasks such as batch jobs besides online transactions, the minimum value of 1.3 can be applied, and a typical value of 1.7 can be used.
Categoryapplied valueExplanation
Award1.9 ~ 2.2When many additional tasks such as batch jobs are performed.
middle1.6 ~ 1.8When certain batch operations are performed within an online transaction
do1.3 ~ 1.5If there are no additional tasks, such as batch processing, besides online transactions.
표. Application 부하 보정
  • Connected load correction This is a correction factor that accounts for the workload generated by integration with other systems, rather than by the number of concurrent users. The DB server can be applied differently depending on the level of the integration workload, such as transactions.
applied valueExplanation
1When there is no linked task among the entire DB server workload (unreflected)
1.1When the DB server integration task involves simple queries and data update integration (10% of total load)
1.2When the DB server integration task involves large‑scale queries and data update integration (20% of total load)
표. 연계 부하 보정
  • Cluster Calibration Cluster calibration is applied when two systems are configured as a single cluster (One-to-one configuration). If a failure occurs on a single server, the remaining servers must handle the entire load required by the Application. In this case, if the system lacks a margin, normal operation becomes difficult due to overload, so you must provide an additional margin. These reserve ratios vary depending on the cluster configuration. In an Active-Active architecture, 100% of each peer system must be reserved for redundancy, but since this is uneconomical and inefficient, a value of 1.3 to 1.5 is applied. The value applied depends on the number of Nodes: apply 1.4 to 1.5 for a 2-Node configuration, and 1.3 for a 3-Node configuration. In an Active-Standby architecture, one device handles the actual service while the other serves as a backup system. In the event of a failure, the entire functionality of the affected equipment is transferred to a standby device, where the function is performed. You do not need to apply a separate cluster correction value in this Active-Standby structure.
Clustering-NodeApplied value
Active-Active - 2 Node1.3 ~ 1.5
Active-Active - 3 Node1.3
Active-Standby1
표. 클러스터 보정
  • System Margin
    System margin is a correction value for stable operation, even during unexpected workload increases or abnormal traffic situations. On-premise systems generally apply an additional 30% margin, i.e., a correction factor of 1.3.

  • System Target Utilization Rate
    Generally, information systems are designed based on a target system utilization rate of 100%, but to ensure stable operation, the actual utilization rate is managed so that it does not reach 100%. Thus, the maximum CPU utilization for stable system operation is referred to as the system target utilization, and typically, a maximum of 70% (coefficient 0.7) is applied.

CPU Sizing via Reference Method

The following is CPU sizing using the reference method.

The reference method calculates the capacity of the system to be built based on the resources of an existing business system.

The method for calculating capacity using the reference method is as shown in the table below.

Calculation itemsContentScopedefault value
Existing CPU core countNumber of cores of the target server in the existing information system
(CPU * cores per CPU)
Calculated value
Layered structureApply hierarchical correction factor relative to the existing CPU core count0.5~3.0
Redundant configurationApply redundancy configuration correction factor0.7~2.0
Server typeApply correction factor according to the existing x86 server (physical/virtualized)
  • Physical: apply correction factor 1.2 (considering virtualization overhead)
  • Virtualization: correction factor not applied
CPU average utilizationAverage CPU utilization of the existing information system
(apply 0.5 when it is 50%)
1%~100%
CPU idle utilizationAdjustment factor for stable system operation1.3
calculation formulaCapacity estimate = number of existing CPU cores * layer configuration correction factor * redundancy configuration correction factor * server type correction factor * average CPU utilization * spare utilization correction factor
Core calculationExisting CPU count(4) * No change in tier configuration(1) * A-A redundancy configuration(0.7) * Server type physical → virtual(1.2) * CPU average utilization 30%(0.3) * Spare utilization 30%(1.3) = approximately 1.3 cores
  • Calculated 2 cores considering the server type
표. 참조법을 통한 CPU 사이징
  • Number of existing CPU cores The calculation is based on the CPU cores utilized by the existing information system server. This reference method calculates based on the number of CPUs and the number of cores per CPU, without considering the CPU’s actual performance.

  • Hierarchy Configuration If the hierarchical configuration of the existing server changes, we calculate a correction factor taking load balancing into account. Calculate the correction values respectively when the number of layers increases or decreases.

Hierarchy changeapplied valueContent
1→2, 2→30.7(Web/WAS/DB)→(Web),(WAS/DB) or (Web/WAS),(DB)
(Web),(WAS/DB) or (Web/WAS),(DB)→(Web),(WAS),(DB)
1→30.5(Web/WAS/DB)→(Web),(WAS),(DB)
2→1, 3→22.0(Web),(WAS/DB) or (Web/WAS),(DB)→(Web/WAS/DB)
(Web),(WAS),(DB)→(Web),(WAS/DB) or (Web/WAS),(DB)
3→13.0(Web),(WAS),(DB)→(Web/WAS/DB)
표. 계층 구성
  • Redundant Configuration
    If the hierarchical configuration of the existing server changes, we calculate a correction factor taking load balancing into account. Calculate the correction value respectively when layers increase or decrease.
Hierarchy changeapplied valueContent
1→20.7Active–Active redundancy configuration correction factor
1→21.0Active–Standby redundancy configuration correction factor: none
2→12.0Change from an Active–Active redundant configuration to a single configuration
표. 이중화 구성
  • Server Type Apply a correction factor, taking into account whether the existing information system server is a physical or virtual server. Apply a correction factor to account for virtualization overhead when migrating from a physical server to the cloud.
existing serverapplied valueContent
physical server1.2Apply the physical‑virtual conversion correction factor due to cloud virtualization.
virtual server1.0Virtualization – no correction factor applied because it is a virtualization transition.
표. 서버 형태
  • Average CPU Usage Measures the computing usage of the existing server by considering the average CPU utilization of the existing information system server.

  • CPU idle utilization Apply a correction factor considering the target CPU utilization when configuring a new server. For example, if the target average CPU utilization is 70%, apply a correction factor of 1.3, considering a 30% buffer.

Server Memory Sizing Based on Formula Calculation

The method for estimating memory size based on formula calculation is much simpler compared to the CPU.

Depending on the system being built, various approaches such as using a programming language or threads are employed to reduce memory usage.

Depending on these strategies, sizing methods vary slightly, and the number of processes running on the system and the amount of memory they use significantly affect memory sizing.

However, this guideline estimates memory size based on the purpose and structure of a general system, without considering programming languages, thread usage, or the memory configuration characteristics of specific systems.

Calculation itemsBasis for calculationScopedefault value
System areaSpace required for OS, DBMS engine, middleware engine, and other utilitiesCalculated value
Memory required per userMemory per user required for using the application, middleware, and DBMS1MB~3MB2MB
Number of concurrent usersUsers who simultaneously use software or systems over a networkCalculated value
OS buffer cache correctionCorrection factor for a memory location that temporarily stores a certain amount of data to improve processing speed1.1~1.31.15
Application required memoryCache areas used by middleware such as the DBMS shared memory and the WAS heap sizeCalculated value
System utilization rateAdjustment factor for stable system operation1.3
calculation formulaMemory (MB) = {System area + (Memory required per user * Number of users) + Application required memory} * Buffer cache adjustment * System margin
Memory estimation example{System area 256MB + (memory required per user 64KB * number of users 3,000) + Application required memory 300MB} * Buffer cache correction 1.15 * System safety margin 30% (1.3)
  • The result of the above formula is 991.54MB
  • Memory can be estimated based on server type
표. 수식 계산법에 의한 서버 메모리 사이징
  • System Area The system area refers to the memory space required for the execution of running software (operating systems, network daemons, database engines, middleware, utilities, etc.), and is calculated based on the memory required by each software when running. In particular, this area must be applied differently depending on the number of licenses for the software, such as databases, and is generally calculated by reflecting the required memory recommended by each software vendor.

  • Memory required per user Memory required per user refers to the memory capacity required per user depending on the usage of Applications, Middleware, DBMS, etc. This value is calculated considering various factors. For example, the required memory per user may vary depending on the application implementation method, middleware application method, user process I/O structure, DBMS vendor’s architecture, etc. However, if calculation is impossible, you may arbitrarily apply a value between 1MB and 3MB.

  • Concurrent users Concurrent users refer to users who use software or a system simultaneously on a network. The number of concurrent users is not calculated separately from a memory perspective; instead, the CPU-based concurrent user estimate from the previous step is applied.

  • OS Buffer Cache Correction To improve processing speed, computers collect a certain amount of data and process it all at once, and the storage location where this data is collected is called a buffer cache. The correction value considering this is called the OS buffer cache correction. OS buffer cache correction can use values from 1.1 to 1.3, and the default value is 1.15.

  • Application Required Memory Application required memory refers to the cache area used by middleware, such as the DBMS shared memory and the WAS heap size. The size of this memory is determined based on the requirements of each middleware, such as DBMS and WAS.

  • System Margin This is a correction value for stable system operation in response to unexpected increases in workload. For on-premise systems, we typically consider an additional 30% buffer (correction factor 1.3).

Container Application Review

Containers are one of the most widely used tools for application modernization.

When you package the application and runtime into a container, you can deploy to any operating system platform, and by providing platform‑independent capabilities, you simplify software development, testing, and deployment processes and facilitate automation.

Containers are effective for building complex multi-tier applications.

For example, if you need to run an Application server, a database, and a message queue together, you can run each as a separate container image in parallel and configure communication between them.

Even if library versions differ across layers, you can run them on the same computing server without conflicts using containers.

Kubernetes is a platform that can efficiently manage and control multiple containers in production environments.

Kubernetes provides horizontal scaling capabilities and blue-green deployment features that minimize downtime.

Additionally, you can distribute user traffic load across containers and manage storage shared by various containers.

GPU Application Review

GPU Server allows you to configure a virtual server by selecting the GPU card type and quantity based on the project’s purpose and scale, and provides high-performance GPU servers at the physical server level using the Pass-through method.

The specifications of the provided NVIDIA GPU are as follows, and RHEL and Ubuntu are provided as operating systems.

CategoryV100 TypeA100 TypeH100 SXM
Service Delivery MethodPass-throughPass-throughPass-through
GPU PerformanceNVIDIA VoltaNVIDIA AmpereNVIDIA Hopper
  • GPU Memory
32GB80GB80GB
  • Transistors
21.1 billion 12nm TSMC54 billion 7nm TSMC80 billion 4N TSMC
  • Tensor performance (FP16 baseline)
125 TFLOPs312 TFLOPs1,979 TFLOPs
  • Memory Bandwidth
900 GB/sec2,000 GB/sec3.35 TB/sec HBM3
  • CUDA Cores
5,120 Cores6,912 Cores16,896 Cores
Tensor Cores640 (1st Generation)1,024 (3rd generation)528 (4th generation)
NVLink performanceNVLink 2NVLink 3NVLink 4
  • Total NVLink bandwidth
300 GB/s600 GB/s900 GB/s
  • Signaling Rate
25 Gbps50 Gbps25 Gbps (x18)
NVSwitch PerformanceNVSwitch 2NVSwitch 3
  • NVSwitch GPU-to-GPU bandwidth
600 GB/s900 GB/s
  • Total aggregated bandwidth
9.6 TB/s7.2TB/s
Linked StorageBlock Storage - SSDBlock Storage - SSDBlock Storage - SSD
표. GPU 타입

GPU servers equipped with Nvidia V100, A100, H100 are provided as server types with 1/2/4/8 GPUs and NVSwitch and NVLink installed on virtualized computing resources.

The CPU:Memory combinations for the provided server types are 1:8 for V100, 1:15 for A100, and 1:20 for H100.

GPU Servers are suitable for workloads requiring fast computation speeds, such as AI model experimentation, prediction, and inference, and allow you to flexibly select and utilize resources with optimized performance based on the type and scale of your tasks.