The page has been translated by Gen AI.

재해 복구 계획

재해 복구 계획

Architecture Design Based on Disaster Recovery Objectives

After deriving the required levels of Recovery Time Objective (RTO) and Recovery Point Objective (RPO) for each business function, you must determine the disaster recovery type based on this and proceed with design and implementation.

Disaster recovery design can be built by categorizing it into three main types—Cold, Warm, and Hot—based on RTO and RPO.

DR configuration levelRTORPOAvailability (Main↔DR)RecoveryCosttarget
Cold LevelOrdera few daysActive-BackupResource allocation and backup recoveryLowNon-critical system
Warm Levela few daysa few hoursActive-ReplicaManual fail-over resource allocation and expansionmiddleGeneral system
Hot Levelseveral hours0Active-StandbyManual Fail-overHighCritical system
표. RTO/RPO 목표에 따른 DR 수준

Cold Level

The Cold Level method stores only the backup data of core services in the DR center and restores services based on this backup data when a disaster occurs.

This method has the advantage of having the lowest initial investment and maintenance costs, but it has the disadvantage of a high risk of data loss depending on the backup cycle.

Additionally, since the Cold Level method requires allocating and configuring new system resources at the DR center during disaster recovery, recovery can take a significant amount of time, making it suitable for low-priority workloads.

The figure below is an example of the Cold Level architecture.

In the event of a disaster, resources from the remaining systems are allocated and expanded before restoring the service, which can lead to data loss and may require a considerable amount of time to recover the service.

Configuration Diagram

  1. ※ VPC Peering, Object Storage Replication, and DBaaS Replica features across regions are scheduled for release in the future (2026).

  2. Create a Virtual Server for DR in the kr-east1 Region (DR center) and leave it powered off during normal operations.

  3. Periodically backs up data from the Virtual Server in the kr-west1 Region(Main Center) to Object Storage.

  4. In the case of DBaaS, data is asynchronously replicated through a cross‑region replica configuration, and in a disaster scenario, the DR replica is promoted to master and used as the primary database.

  5. In the event of a disaster, recover the data in Object Storage (DR) within the kr-east1 Region (DR center) to resume the service.

Warm Level

The Warm Level method is an approach that focuses on deploying systems with high service criticality to the DR center.

Since real-time replication between the primary center and the DR center is not performed, a periodic synchronization process is required.

For Object Storage, we use the DR synchronization feature to perform bucket‑level asynchronous replication from the Object Storage in the kr‑west1 Region (primary center) to the Object Storage (DR) in the kr‑east1 Region (DR center).

However, compared to the Hot Site method, it has the advantage of relatively lower initial investment and maintenance costs.

Hot Level

The Hot Level method is a way to build a system in an Active-Standby state based on real-time replication.

This method is suitable for mission-critical systems because it halts replication when a disaster occurs and switches operations to the DR center, enabling rapid service resumption.

In a disaster scenario, the Object Storage (DR) bucket (DR) is accessed via its endpoint.

Configuration Diagram

  1. Connect kr-west1 Region (Primary Center) and kr-east1 Region (DR Center) via VPC Peering.

  2. For WEB/APP Virtual Servers, create a DR Virtual Server in the kr-east1 Region (DR center) through the Virtual Server DR service. Use the DR Virtual Server as the primary Virtual Server during a disaster or simulation training.

  3. Backup DR is a feature that can be enabled when creating a service. When Backup DR is enabled, when a backup is performed on the primary site, the backup copy is replicated and stored on the DR site.

  4. For File Storage, configure a replica volume in the kr-east1 Region (DR center) using the DR replication feature of the File Storage in the kr-west1 Region (Main center). After setting the replication cycle and synchronization policy, the volume is replicated. In the event of a disaster, synchronization is stopped, and the replicated volume is changed to R/W mode for use.

  5. Conceptual Diagram
    Conceptual Diagram

Inter-region data replication for disaster recovery

Samsung Cloud Platform supports DR through various levels of storage replication.

Virtual Server DR

Virtual Server DR is a service that replicates Virtual Servers and their attached Block Storage to a Region different from the one currently in use, provides disaster recovery planning and testing, and offers recovery capabilities in the event of an actual disaster.

Block Storage is actually replicated, and the Virtual Server at the DR site remains in a stopped state.

The replication interval can be selected from 5 minutes, 1 hour, daily, weekly, or monthly; daily replication runs at 23:59:00, weekly replication runs on Sunday at 23:59:00, and monthly replication runs on the 1st at 23:59:00.

Backup DR

Conceptual Diagram
Figure. File Storage DR Implementation Concept

In Database service DR, you can create a replica of the primary site’s master DB at the DR site and configure it.

Object Storage DR

Object Storage DR is configured through synchronization settings between the primary site bucket and the DR site bucket. To configure DR, you must enable versioning on the primary site’s bucket.

To configure a replica, a peering connection must be established between the VPC of the primary site and the VPC of the DR site. ※ The Cross-region Object Storage Replication feature is scheduled for release (‘26)

File Storage DR

File Storage DR can be configured from the primary site File Storage by setting the DR Region, DR Volume name, and replication cycle.

In the event of a disaster, manually promote the replica at the DR site to master and bring it online.

※ The DBaaS replication feature across regions is planned for release in 2026.

Database Service DR

When you use Container Registry DR, the DR registry and the Object Storage bucket are replicated to a different region.

Once a Replica is configured, changes to the primary site are synchronized with the Replica and reflected.

※ The cross‑region Container Registry feature is planned for future release (2026)

When a service outage occurs, if recovery is not possible within the predefined time based on the assessment of the incident severity and the estimated recovery time, a disaster is declared and the disaster recovery procedures are carried out.

Conceptual Diagram
Ensure that the same change operations are performed on the primary site and the DR site.

Container Registry DR

When updates, patches, or similar actions are performed on the primary site, the infrastructure, applications, and configuration of the DR environment may change.

This allows you to replicate the image of a Kubernetes Cluster from one Region to another to configure an identical Kubernetes Cluster.

When configured with File Storage DR, you can implement Kubernetes Cluster DR.

Therefore, you should set up a test/staging environment to validate changes first, and then apply them to the primary site and DR site to improve deployment consistency and reliability.

Establishing a Disaster Recovery Failover Plan

Do not make changes directly on the main site; instead, make changes through the test/staging environment.

The stages of disaster recovery are as follows.

stepactivityMember responsibilities
Disaster declarationDisaster status assessment
  • Establish the response headquarters
  • Emergency notification
  • Operation of the situation room
  • Assess the current disaster status
  • Determine estimated recovery time (main center)
  • Prepare report for the chief executive
Disaster declarationDisaster Recovery System
Transition Decision
  • Decide the switch considering the estimated recovery time and return time
  • Control the disaster recovery system switchover procedures
Disaster recovery activitiesto the disaster recovery center
service transition
  • Confirm service restart
  • Prepare for long-term operation at the disaster recovery center
Disaster recovery activitiesMain Center Recovery
  • Urge hardware and software suppliers to restore
  • Establish a procurement plan if recovery is impossible (preliminary actions followed by procurement approval)
  • Control disaster recovery transition and report final service verification
  • Prepare internal and external reports and presentation materials
  • Estimate the main center recovery timeline and develop an operation plan for the recovery center
Main Center RecoveryDecision to return to the main center
  • Prepare return plan and decide timing
  • Verify stabilization of the main center
  • Confirm service transition due to return
  • Identify service details and issues after transition
  • Control the disaster recovery system return procedure
표. 재해 복구 단계

Service Change Management

Maintaining Consistency Between the Primary Site and DR Site

모범 사례
Periodically run failure or disaster scenarios to test the DR system.

When a disaster occurs, establish procedures for switching to the DR site and returning to the primary center, and regularly verify that these procedures operate correctly.

As a result, the system may not function properly when performing disaster recovery.

During a simulation exercise, we assume fault or disaster scenarios to test the system and response procedures.

설계 원칙
  1. Main center return validation
  2. By assuming a failure or disaster scenario, the team actually carries out the required tasks to enhance response capabilities and identify improvement measures.

Change Management through Automation

모범 사례
Execute the switchover procedures according to the disaster recovery plan and verify that the automatic switchover process operates correctly.

If you perform service changes manually, various variables may arise.

Consequently, if there are configuration differences between the primary site and the DR site, the primary site’s functionality may not operate as intended at the DR site during disaster recovery execution.

Therefore, you should automate the deployment process to minimize the impact of such potential errors.

설계 원칙
  1. The disaster recovery exercise plan should detail the schedule, organization and participants, the scope and scenarios of the drill, and be documented down to the level of system commands.
  2. Additionally, a checklist for each task, along with the responsible personnel and emergency contact information, must be provided.
  3. We manage the process from development to deployment through continuous integration and continuous delivery (CI/CD).
모범 사례
Periodically run failure or disaster scenarios to test the DR system.

Incident/Disaster Response Test

모범 사례
Periodically run failure or disaster scenarios to test the DR system.

When a disaster occurs, establish procedures for switching to the DR site and returning to the primary center, and regularly verify that these procedures operate correctly.

During a simulation exercise, we assume fault or disaster scenarios to test the system and response procedures.

The key items to check during a disaster recovery drill are as follows.

  • Whether data in the disaster recovery system has been successfully recovered
  • Command and coordination system of the recovery team
  • Internal/External communication status
  • Disaster recovery system performance
  • Main center return validation
  • Notification procedures and other miscellaneous matters
설계 원칙
  1. By assuming a failure or disaster scenario, the team actually carries out the required tasks to enhance response capabilities and identify improvement measures.
  2. Execute the switchover procedures according to the disaster recovery plan and verify that the automatic switchover process operates correctly.

The disaster recovery exercise plan should detail the schedule, organization and participants, the training scope and scenarios, and be drafted down to the level of system commands.

Additionally, a checklist for each task, along with the responsible personnel and emergency contact information, must be documented.

The table below is an example of disaster recovery training procedures and execution details.

orderTraining methodTasks performedResponsible department
1Preliminary preparation
  • Identify business impact
  • Discuss schedule and method
  • Prepare and approve detailed work plan
  • Inspect disaster recovery system and address deficiencies
Related work
Person in charge
2Disaster Declaration
  • Disaster declaration and notification (main center, disaster recovery center)
Emergency Response Team
3Disaster Recovery
System Operation
  • Conduct disaster recovery system activation tasks
    : includes DB, Server, APP, N/W
System, Network,
Business Management
4Work test
  • Conduct self-test and assess normality
Person in charge
5Disaster Recovery System
Transition to Live Operations
  • During a mock conversion drill, do not perform actual work conversion
System, Network,
Operations
6Normal status
Monitoring
  • Monitoring whether the disaster recovery center is performing its duties
System, Network,
Task Owner
7Disaster Recovery
System Outage
  • Disaster recovery system shutdown
System, Network,
Operations
8Return to work
  • Conduct main center return operation
System, Network,
Task Owner
9Result summary
  • Organize schedule, procedures, and training results
  • Identify and address pending issues
Related work
person in charge
표. 재해 복구 모의훈련 절차 예시 (TTA, 정보 시스템 재해 복구 지침)