재해 복구 계획
재해 복구 계획
Architecture Design Based on Disaster Recovery Objectives
After deriving the required levels of Recovery Time Objective (RTO) and Recovery Point Objective (RPO) for each business function, you must determine the disaster recovery type based on this and proceed with design and implementation.
Disaster recovery design can be built by categorizing it into three main types—Cold, Warm, and Hot—based on RTO and RPO.
| DR configuration level | RTO | RPO | Availability (Main↔DR) | Recovery | Cost | target |
|---|---|---|---|---|---|---|
| Cold Level | Order | a few days | Active-Backup | Resource allocation and backup recovery | Low | Non-critical system |
| Warm Level | a few days | a few hours | Active-Replica | Manual fail-over resource allocation and expansion | middle | General system |
| Hot Level | several hours | 0 | Active-Standby | Manual Fail-over | High | Critical system |
Cold Level
The Cold Level method stores only the backup data of core services in the DR center and restores services based on this backup data when a disaster occurs.
This method has the advantage of having the lowest initial investment and maintenance costs, but it has the disadvantage of a high risk of data loss depending on the backup cycle.
Additionally, since the Cold Level method requires allocating and configuring new system resources at the DR center during disaster recovery, recovery can take a significant amount of time, making it suitable for low-priority workloads.
The figure below is an example of the Cold Level architecture.
In the event of a disaster, resources from the remaining systems are allocated and expanded before restoring the service, which can lead to data loss and may require a considerable amount of time to recover the service.
※ VPC Peering, Object Storage Replication, and DBaaS Replica features across regions are scheduled for release in the future (2026).
Create a Virtual Server for DR in the kr-east1 Region (DR center) and leave it powered off during normal operations.
Periodically backs up data from the Virtual Server in the kr-west1 Region(Main Center) to Object Storage.
In the case of DBaaS, data is asynchronously replicated through a cross‑region replica configuration, and in a disaster scenario, the DR replica is promoted to master and used as the primary database.
In the event of a disaster, recover the data in Object Storage (DR) within the kr-east1 Region (DR center) to resume the service.
Warm Level
The Warm Level method is an approach that focuses on deploying systems with high service criticality to the DR center.
Since real-time replication between the primary center and the DR center is not performed, a periodic synchronization process is required.
For Object Storage, we use the DR synchronization feature to perform bucket‑level asynchronous replication from the Object Storage in the kr‑west1 Region (primary center) to the Object Storage (DR) in the kr‑east1 Region (DR center).
However, compared to the Hot Site method, it has the advantage of relatively lower initial investment and maintenance costs.
Hot Level
The Hot Level method is a way to build a system in an Active-Standby state based on real-time replication.
This method is suitable for mission-critical systems because it halts replication when a disaster occurs and switches operations to the DR center, enabling rapid service resumption.
In a disaster scenario, the Object Storage (DR) bucket (DR) is accessed via its endpoint.
Connect kr-west1 Region (Primary Center) and kr-east1 Region (DR Center) via VPC Peering.
For WEB/APP Virtual Servers, create a DR Virtual Server in the kr-east1 Region (DR center) through the Virtual Server DR service. Use the DR Virtual Server as the primary Virtual Server during a disaster or simulation training.
Backup DR is a feature that can be enabled when creating a service. When Backup DR is enabled, when a backup is performed on the primary site, the backup copy is replicated and stored on the DR site.
For File Storage, configure a replica volume in the kr-east1 Region (DR center) using the DR replication feature of the File Storage in the kr-west1 Region (Main center). After setting the replication cycle and synchronization policy, the volume is replicated. In the event of a disaster, synchronization is stopped, and the replicated volume is changed to R/W mode for use.
Inter-region data replication for disaster recovery
Samsung Cloud Platform supports DR through various levels of storage replication.
Virtual Server DR
Virtual Server DR is a service that replicates Virtual Servers and their attached Block Storage to a Region different from the one currently in use, provides disaster recovery planning and testing, and offers recovery capabilities in the event of an actual disaster.
Block Storage is actually replicated, and the Virtual Server at the DR site remains in a stopped state.
The replication interval can be selected from 5 minutes, 1 hour, daily, weekly, or monthly; daily replication runs at 23:59:00, weekly replication runs on Sunday at 23:59:00, and monthly replication runs on the 1st at 23:59:00.
Backup DR
In Database service DR, you can create a replica of the primary site’s master DB at the DR site and configure it.
Object Storage DR
Object Storage DR is configured through synchronization settings between the primary site bucket and the DR site bucket. To configure DR, you must enable versioning on the primary site’s bucket.
To configure a replica, a peering connection must be established between the VPC of the primary site and the VPC of the DR site. ※ The Cross-region Object Storage Replication feature is scheduled for release (‘26)
File Storage DR
File Storage DR can be configured from the primary site File Storage by setting the DR Region, DR Volume name, and replication cycle.
In the event of a disaster, manually promote the replica at the DR site to master and bring it online.
※ The DBaaS replication feature across regions is planned for release in 2026.
Database Service DR
When you use Container Registry DR, the DR registry and the Object Storage bucket are replicated to a different region.
Once a Replica is configured, changes to the primary site are synchronized with the Replica and reflected.
※ The cross‑region Container Registry feature is planned for future release (2026)
When a service outage occurs, if recovery is not possible within the predefined time based on the assessment of the incident severity and the estimated recovery time, a disaster is declared and the disaster recovery procedures are carried out.
Container Registry DR
When updates, patches, or similar actions are performed on the primary site, the infrastructure, applications, and configuration of the DR environment may change.
This allows you to replicate the image of a Kubernetes Cluster from one Region to another to configure an identical Kubernetes Cluster.
When configured with File Storage DR, you can implement Kubernetes Cluster DR.
Therefore, you should set up a test/staging environment to validate changes first, and then apply them to the primary site and DR site to improve deployment consistency and reliability.
Establishing a Disaster Recovery Failover Plan
Do not make changes directly on the main site; instead, make changes through the test/staging environment.
The stages of disaster recovery are as follows.
| step | activity | Member responsibilities |
|---|---|---|
| Disaster declaration | Disaster status assessment |
|
| Disaster declaration | Disaster Recovery System Transition Decision |
|
| Disaster recovery activities | to the disaster recovery center service transition |
|
| Disaster recovery activities | Main Center Recovery |
|
| Main Center Recovery | Decision to return to the main center |
|
Service Change Management
Maintaining Consistency Between the Primary Site and DR Site
When a disaster occurs, establish procedures for switching to the DR site and returning to the primary center, and regularly verify that these procedures operate correctly.
As a result, the system may not function properly when performing disaster recovery.
During a simulation exercise, we assume fault or disaster scenarios to test the system and response procedures.
- Main center return validation
- By assuming a failure or disaster scenario, the team actually carries out the required tasks to enhance response capabilities and identify improvement measures.
Change Management through Automation
If you perform service changes manually, various variables may arise.
Consequently, if there are configuration differences between the primary site and the DR site, the primary site’s functionality may not operate as intended at the DR site during disaster recovery execution.
Therefore, you should automate the deployment process to minimize the impact of such potential errors.
- The disaster recovery exercise plan should detail the schedule, organization and participants, the scope and scenarios of the drill, and be documented down to the level of system commands.
- Additionally, a checklist for each task, along with the responsible personnel and emergency contact information, must be provided.
- We manage the process from development to deployment through continuous integration and continuous delivery (CI/CD).
Incident/Disaster Response Test
When a disaster occurs, establish procedures for switching to the DR site and returning to the primary center, and regularly verify that these procedures operate correctly.
During a simulation exercise, we assume fault or disaster scenarios to test the system and response procedures.
The key items to check during a disaster recovery drill are as follows.
- Whether data in the disaster recovery system has been successfully recovered
- Command and coordination system of the recovery team
- Internal/External communication status
- Disaster recovery system performance
- Main center return validation
- Notification procedures and other miscellaneous matters
- By assuming a failure or disaster scenario, the team actually carries out the required tasks to enhance response capabilities and identify improvement measures.
- Execute the switchover procedures according to the disaster recovery plan and verify that the automatic switchover process operates correctly.
The disaster recovery exercise plan should detail the schedule, organization and participants, the training scope and scenarios, and be drafted down to the level of system commands.
Additionally, a checklist for each task, along with the responsible personnel and emergency contact information, must be documented.
The table below is an example of disaster recovery training procedures and execution details.
| order | Training method | Tasks performed | Responsible department |
|---|---|---|---|
| 1 | Preliminary preparation |
| Related work Person in charge |
| 2 | Disaster Declaration |
| Emergency Response Team |
| 3 | Disaster Recovery System Operation |
| System, Network, Business Management |
| 4 | Work test |
| Person in charge |
| 5 | Disaster Recovery System Transition to Live Operations |
| System, Network, Operations |
| 6 | Normal status Monitoring |
| System, Network, Task Owner |
| 7 | Disaster Recovery System Outage |
| System, Network, Operations |
| 8 | Return to work |
| System, Network, Task Owner |
| 9 | Result summary |
| Related work person in charge |





