운영 계획
운영 계획
Identifying and Defining Operational Requirements
Operational excellence means continuous improvement activities to run applications stably with minimal downtime in order to maximize business value and improve system efficiency.
The Operations team is responsible for managing the Application’s infrastructure, security, and all software-related issues to ensure the Application operates reliably.
Especially for enterprise applications, because availability must be clearly defined through a Service Level Agreement (SLA), the operations team must fully understand the business requirements and be able to respond quickly to the various events that may arise.
First, it is necessary to assess the legal and regulatory elements that could impact operations, based on the industry and tasks the organization performs.
It is advisable to review compliance with the Security Design Principles (Security Design Principles > I. Security Requirements Analysis and Design Principles > 1. Compliance and Security Requirements).
Among various compliance requirements, the Information Security Management System (ISMS) is a certification that companies or organizations operating businesses above a certain scale must obtain when providing services to end users or consumers, and it can be considered the most common and essential compliance requirement.
The table below summarizes the operations management area among the ISMS control items.
When establishing operational excellence design principles, it is essential to closely review, especially, the aspects of change management, performance management, and incident management among these items.
| Category | Control items | Explanation |
|---|---|---|
| Information system acquisition and development security | Separate test and production environments | Development and test systems must, in principle, be separated to reduce the risk of unauthorized access to and modification of the production system. |
| Security for information system acquisition and development | Source program management | Source programs must be managed to allow access only to authorized users and should not be stored in the production environment. |
| Security for information system acquisition and development | Production environment migration | When transferring a newly introduced, developed, or modified system to the production environment, a controlled procedure must be followed, and the execution code must be run in accordance with testing and user acceptance procedures. |
| System and Service Operations Management | Change Management | Establish and implement procedures to manage all changes to information system assets, and analyze the impact of changes on system performance and security before implementation. |
| System and Service Operations Management | Performance and Incident Management | To ensure information system availability, performance and capacity requirements must be defined, the current status continuously monitored, and procedures for detection, logging, analysis, recovery, and reporting to effectively respond to incidents must be established and managed. |
Typically, source code is managed using configuration management tools such as Git and SVN, and these tools provide features such as access permission settings, version control, and change history tracking.
When moving a system that has completed development to the production environment, the migration must be performed by a person other than the developer.
| Category | Control items | Explanation |
|---|---|---|
| Service transition | Change Management | Manage all changes to IT services and infrastructure according to controlled procedures to minimize service interruptions and business risks caused by changes. |
| Service transition | Release and Deployment Management | Plan and control the entire lifecycle of safely and successfully deploying and transferring approved changes to the production environment. |
| Service operation | Incident Management | When unexpected service interruptions or quality degradation (failures) occur, quickly detect, record, analyze, and restore the service to minimize business impact. |
| Service operation | Problem Management | Identify the root cause of incidents and establish measures to prevent recurrence, thereby proactively preventing incidents and ensuring long-term stability. |
| Service operation | Service Level Management | Continuously monitor, measure, report, and carry out improvement activities to ensure that the service level objectives defined in the SLA (such as availability, performance, etc.) are being met. |
| Service operation | Capacity Management | Secure and manage the IT resources and performance needed for current and future business requirements cost‑effectively, and monitor to prevent any performance degradation. |
Separation of Test and Production Environments
The incident response history must be documented, including the incident occurrence date and time, severity, assignee and responsible person, incident details and cause, actions and recovery details, and preventive measures, and it should be managed in the form of an incident handling report.
In addition to the mandatory operational management activities required by these compliance requirements, resource operation management items in the cloud environment must also be reviewed.
To achieve true operational excellence in cloud environments, a collaborative, team‑oriented organizational culture that can effectively support advanced technology adoption is essential.
Source Program Management
Grant access to the source program only to authorized developers and prepare for emergencies through regular backups.
This is because the core goal of cloud operational excellence is to achieve both business speed (Agility) and service reliability (Reliability)—two values once considered conflicting—simultaneously and in a balanced manner.
These fundamental differences in objectives inevitably caused interdepartmental goal conflicts, and changes from the development team created bottlenecks during the operations team’s stability review stage, becoming the primary cause of reduced business agility.
Migration to Production Environment
The DevOps culture emerged to solve these chronic traditional operational problems and achieve the shared goal of speed and reliability.
DevOps organization structure means not simply separating development (Dev) and operations (Ops) into distinct teams, but a collaborative framework that aligns the goals of the two groups and shares responsibility across the entire service lifecycle (planning, development, deployment, operation).
In a DevOps environment, application development and deployment are automated, and developers may sometimes perform deployments directly.
Even in these cases, it is more effective to delegate permissions only to specific personnel rather than granting deployment permissions to all developers, and to establish control by setting up approval procedures in the automated deployment process.
Change Management
Based on this DevOps culture, the role of modern cloud operations organizations is fundamentally redefined.
Systematic management of architecture, virtual server changes, image upgrades, and related matters is required even in the Samsung Cloud Platform environment.
To achieve this, leverage IaC (Infrastructure as Code) tools to templatize changes and thoroughly document each change.
Additionally, you must review the state before and after applying changes to minimize unexpected impacts.
Instead, to support both business speed and reliability, we build and provide an automated CI/CD pipeline, Infrastructure-as-Code (IaC) templates, and a platform with built-in monitoring and security, enabling development teams to deploy quickly and safely on their own.
This protects the system from unexpected issues that may arise during changes and allows for quick recovery to a normal state.
During the migration of existing IT infrastructure to the cloud, it is common to adopt a Lift & Shift approach or to move only certain workloads to the cloud while keeping the remainder in an on‑premises environment.
This improves the stability and efficiency of change operations.
Additionally, you must continuously improve the quality of IaC code and processes through regular reviews and audits to ensure that security and compliance requirements are met.
Through this systematic approach, you can efficiently perform operational change management in cloud environments and maintain system stability and reliability.
Performance and Fault Management
To ensure the availability of information systems, procedures must be established that include criteria for identifying performance and capacity management targets, definitions of performance and capacity requirements (thresholds), monitoring methods, result recording and analysis, and response plans for when thresholds are exceeded.
This approach involves forming a specialized team tailored to the characteristics and requirements of the cloud environment, responsible for managing, monitoring, securing, and optimizing the cloud infrastructure.
Previously, there was an on‑premises operations organization structured by function, but after the cloud migration, many staff now manage cloud‑based workloads, and the role of the operations organization is being reorganized to focus on cloud infrastructure.
In this structure, the development team continues to handle application-related tasks, while the operating model is designed so that platform operations are reorganized to align with the cloud environment.
In this case, the cloud operations team and the development team are merged into a DevOps team, and their roles are redefined as a unit responsible for continuous integration and deployment processes.
The following table is an example of management items for cloud operations and summarizes the management items for Managed Services.
| Item | Explanation |
|---|---|
| Billing/Report |
|
| Service Support |
|
| Resource Management |
|
| OS operation |
|
| Incident response |
|
| Technical Support |
|
| Security |
|
| Monitoring |
|
| Report |
|
Cloud Operations Organization Structure
To achieve true operational excellence in cloud environments, a collaborative, people‑centric organizational culture that can effectively support advanced technology adoption is essential.
This is because the core goal of cloud operational excellence is to achieve both business speed (Agility) and service reliability (Reliability) simultaneously and in a balanced manner, values that were once considered conflicting.
Traditional IT organizations in the past were based on a structure where development and operations teams were clearly separated. Under this structure, the development team prioritized the rapid release of new features, while conversely, the operations team prioritized uninterrupted stability without failures.
These fundamental differences in objectives inevitably caused interdepartmental goal conflicts, and changes from the development team created bottlenecks during the operations team’s stability review stage, becoming the primary cause of reduced business agility.
The DevOps culture emerged to solve these chronic traditional operational problems and achieve the shared goal of speed and reliability.
DevOps organization structure means not merely separating development (Dev) and operations (Ops) into distinct teams, but a collaborative framework that aligns the goals of both groups and shares responsibilities across the entire service lifecycle (planning, development, deployment, operation).
Based on this DevOps culture, the role of modern cloud operations organizations is fundamentally redefined.
It no longer remains in the traditional role of controlling and managing changes to ensure stability.
Instead, to support both business speed and stability, we build and provide an automated CI/CD pipeline, Infrastructure-as-Code (IaC) templates, and a platform with built-in monitoring and security, allowing development teams to deploy quickly and safely on their own.
| Category | Traditional IT operations organization | Cloud Operations Organization |
|---|---|---|
| Operational priority | Optimizing goals by department (e.g., development focuses on features, operations on stability) | Shared business goals (fast deployment and reliable service) |
| Role distinction | Clear delineation of technical areas (servers, networks, DB, security) | Emphasizing automation and efficiency, executing multiple technical domains synergistically based on diverse technology stacks |
| Key role | System Administrator, Network Administrator, Database Administrator, Security Administrator, etc | SRE, DevOps engineer, cloud architect, security manager etc |
| Operating method and process | Operates focusing on tasks such as system updates and maintenance, with development and operations separated. | Performing automated updates and management tasks, development and operations tasks are connected or integrated. |
The operating model in a cloud environment may vary depending on how workloads are configured.
Cloud Operations within Existing Organizational Structures
During the migration of existing IT infrastructure to the cloud, it is common to adopt a Lift & Shift approach or to move only certain workloads to the cloud while keeping the remainder in an on‑premises environment.
In these scenarios, when designing a cloud operating model, you can consider two main approaches: adding a separate cloud operations organization within the existing IT operations organization, or integrating cloud operations tasks into the roles of the existing organization.
The first approach is to add a separate cloud operations organization.
This approach involves forming a specialized team tailored to the characteristics and requirements of the cloud environment, responsible for managing, monitoring, securing, and optimizing the cloud infrastructure.
This approach strengthens expertise in cloud environments and enables focused management of each environment by separating roles from the existing on-premise operations organization.
However, this approach has the disadvantage of potentially causing role duplication or communication costs within the organization.
The second approach is integrating cloud operations into existing organizational roles.
This approach enables the existing IT operations organization to manage both cloud and on-premises environments, establishing an integrated operations model across the two environments.
It also strengthens collaboration within the organization and is beneficial for maintaining consistent policies and processes between cloud and on-premises environments.
However, while it allows for efficient utilization of internal resources, it has the disadvantage that operational efficiency may decrease if there is insufficient expertise in cloud environments.
Infrastructure Operations After Cloud Migration
The roles of cloud infrastructure operations and development teams undergo significant changes when an organization’s primary workloads are migrated to the cloud or newly built on a cloud basis.
Previously, there was an on-premises operations organization structured by function, but after the cloud migration, many staff now manage cloud-based workloads, and the role of the operations organization is being reorganized to focus on cloud infrastructure.
In this architecture, the development team continues to handle application-related tasks, while the operating model is designed so that platform operations are reorganized to align with the cloud environment.
The cloud infrastructure operations organization is designed considering the characteristics of the cloud environment and operates with a focus on cloud resource management, monitoring, security, and optimization.
This organization leverages cloud service provider (CSP) platforms to manage infrastructure efficiently and streamlines repetitive tasks through automation tools and scripts.
Furthermore, it provides the flexibility to respond quickly to business needs by leveraging the elasticity and scalability of the cloud environment.
The development team continues to be responsible for application development-related tasks and optimizes development processes in the cloud environment.
To achieve this, the development team designs a cloud-native architecture and builds microservices-based applications to maximize the benefits of the cloud environment.
Additionally, it enables rapid and reliable software deployment through CI/CD pipelines and facilitates seamless integration between cloud infrastructure and applications.
DevOps System Operations
If you decide to rebuild your organization’s key systems as cloud-based CI/CD applications, the operating model will also transform into an optimized Cloud One Team structure.
In this case, the cloud operations team and the development team are merged into a DevOps team, and their roles are redefined as a unit responsible for continuous integration and deployment processes.
This enhances collaboration between development and operations, enabling simultaneous rapid deployment and stable operations in a cloud environment.
Additionally, operations may be integrated in the form of DevSecOps to enhance security.
This refers to a structure where security, development, and operations work together organically as a single team, integrating security elements into the development and operations processes to enable building secure applications from the outset.
This integrated operating model maximizes efficiency and stability in cloud environments and establishes a foundation for effectively achieving organizational business goals.
Optimized Cloud One Team operations break down boundaries between teams and foster a culture of collaboration for common goals.
Through this, organizations can respond quickly and flexibly even in cloud environments, laying the foundation for continuous innovation and maintaining competitiveness.
