데이터 저장소 설계
데이터 저장소 설계
Storage Selection
Storage is one of the key factors impacting Application performance.
All software applications interact with storage for installation, logging, and file access.
The optimal storage solution may vary depending on the following factors.
| Access method | Storage considerations |
|---|---|
| Access pattern | sequential access, random access |
| Access frequency | Online (Hot), Offline (Warm), Archive (Cold) |
| Update frequency | High update frequency (operating system, database volume), low (file storage, etc.) |
| Access availability | Single-instance connection, shared connection |
The storage options provided by Samsung Cloud Platform are as follows.
| Category | Block Storage | File Storage | Object Storage | Archive Storage |
|---|---|---|---|---|
| function | Data is stored in fixed-size blocks of a predefined array, and the high‑availability storage service is allocated directly on the server for use. | File storage that provides data access to heterogeneous clients over the network | Object storage service built to enable users to store and use desired data on the Internet | Storage service suitable for long-term retention of large-scale data |
| Access configuration | Direct VM connection / Multi-Attach | NFS/CIFS | REST API (S3 compatible) | Connect to Object Storage |
| Access control | Public IP / Server / VPC Endpoint | Public IP / Server / VPC Endpoint | Project Public / Private Access Settings | |
| Encryption | Select KMS encrypted volume | Basic AES256 encryption applied | Select AES256 encryption | Select AES256 encryption |
| Data protection | VM snapshot | snapshot | Version control | - |
| disk | SSD | SSD/HDD | - | |
| capacity |
| No limit | No limit | No limit |
| Purpose | Operating systems, databases, and other high‑throughput data storage | Web content management, storage for entertainment data processing, container storage, big data analytics | Storing objects such as web content, logs, etc. | Long-term preservation of large-scale data |
Database Selection
Generally, databases are used to standardize a common platform and improve management efficiency.
You must select an appropriate database based on data requirements, as an improper selection can lead to increased system latency and performance degradation.
Database selection depends on the application’s requirements, such as availability, scalability, data structure, throughput, and durability.
Among the various factors to consider when selecting a database, access patterns significantly influence the choice of technology, so it is advisable to optimize the database based on these patterns.
Most databases provide configuration options for workload optimization, and you can review operational aspects such as scalability, backup, recovery, and maintenance, along with memory, cache, and storage optimization.
In this document, we will explore various features to meet the Application’s database requirements.
OLTP(Online Transaction Processing)
Traditional relational databases mostly use Online Transaction Processing (OLTP).
Samsung Cloud Platform provides EPAS, MySQL, MariaDB, PostgreSQL, and Microsoft SQL Server as managed Database services.
Relational databases are suitable for applications handling complex business transactions, such as finance and e-commerce, and are advantageous for data aggregation and complex query processing.
Considerations for relational database optimization are as follows.
Selection of server type, including computing, memory, storage, and networking
- Storage volume configuration
- Select the database engine that suits your needs
- Database options such as Schema, Index, and View
Relational databases can increase throughput through vertical scaling, and horizontal scaling of read operations is also possible via replicas.
OLAP(Online Analytical Processing)
You can utilize a data warehouse platform to analyze large-scale structured data, and on Samsung Cloud Platform, you can implement a column-based, high-performance MPP analysis environment through Vertica (DBaaS).
Modern data warehouse technologies adopt a columnar format and use MPP (Massive Parallel Processing), which helps improve data analysis speed.
When using columnar format, if you need to aggregate data from only one column, you do not need to scan the entire table.
As a result, less data is scanned compared to the row format, and query performance improves.
MPP stores data by distributing it across child nodes and executes queries on the leader node.
The leader node distributes queries to the subordinate nodes based on the partition key.
Here, each node selects a part of the query and performs parallel processing.
Then, the leader node collects query results from each child node and returns the aggregated results.
This parallel processing method accelerates query execution and enables faster processing of large volumes of data.
NoSQL
Various applications, such as social media, the Internet of Things, clickstream data, and logs, generate large amounts of unstructured and semi-structured data.
This data has a dynamic schema, and each record can have a different structure.
Storing this data in a relational database can be inefficient.
Since relational databases must store data based on a fixed schema, unnecessary null values may be stored or data loss may occur.
Unstructured or NoSQL databases can store data flexibly without being constrained by a fixed schema.
Each record can have a different number of columns and can be stored in the same table.
NoSQL databases support large-scale data storage and provide low latency.
Additionally, it supports horizontal scaling by default and can be easily scaled by adding nodes as needed.
However, since NoSQL databases do not support complex queries such as table and entity joins, using a relational database is more suitable in this case.
On Samsung Cloud Platform, you can use CacheStore, a Redis-based in-memory database, for high-performance database caching or application state storage.
Retrieving Data
There are times when you need to quickly search large volumes of data to rapidly resolve issues or gain business insights.
Searching for Application data helps you access detailed information and analyze it from various perspectives.
To retrieve data with low latency and high throughput, search engine technology must be used.
Samsung Cloud Platform provides the Search Engine service.
Search Engine provides automated creation and configuration of ElasticSearch for data analysis.
The Search Engine can be deployed on a VM, and its availability and performance can be improved through cluster and replica configurations.
Database Performance Improvement
DB Optimization
Database performance improvement refers to designing and operating a database to maintain its performance for as long as possible, prioritizing overall business efficiency—such as response speed and throughput—over standalone server performance management.
Database performance degrades due to continuous business changes, increasing concurrent users, and continuous data growth.
Generally, database performance is defined by the response time to user requests.
Optimal database performance can be defined as achieving maximum performance with minimal resources.
Factors that degrade database performance can arise from the initial analysis and design phases through to the development and pre-operation stages.
| step | Optimization section | Content |
|---|---|---|
| Analysis | Business Process Optimization | Eliminate inefficient elements, and perform process optimization that aligns with the business vision and strategy |
| Analysis | architecture | Considering transaction throughput, performance, data growth trends, security, and availability, set the architectural direction. |
| Design | Physical design | Response time, distributed DB environment, concurrent user count, data size, parallel processing and distribution, centralization, design for redundancy |
| Design | Application Design | Access paths, data request forms, and index considerations for optimal performance when integrated with the DB. |
| development | SQL | By improving developers’ skills and ensuring compliance with development standards, it enables adherence to performance policies. |
| Operation | OS tuning | Perform tuning for CPU, memory, disk I/O, and related components. |
| Operation | Network tuning | Perform tuning based on the volume of data, files, and other transfers. |
| Operation | DB tuning | Perform tuning of elements such as data architecture, parameters, and log files |
| Operation | Application Tuning | Continuously monitor the production system and perform tuning by applying SQL and index policies for performance‑degrading applications, as well as cluster policies. |
Caching Implementation
Caching is the process of temporarily storing data or files in an intermediate location between the client and persistent storage to process future requests faster and reduce network throughput.
Caching improves application performance and reduces costs by reusing previously retrieved data. The following content illustrates the caching mechanisms at each layer.
| layer | Target | Caching implementation |
|---|---|---|
| Web layer | Web content | Improving web content delivery latency of the web server → Content delivery using a Global CDN |
| Application layer | User session data | Improve application performance and data access performance using a key/value store and local cache → State management using CacheStore |
| Database layer | Data | Reduced latency when requesting database queries using a database buffer and key/value store → Implemented data caching with CacheStore and offloaded read load by configuring replicas |
Performance efficiency of the web layer primarily relates to the delivery of static content such as images, videos, and HTML pages.
The closer this static content is served to the user’s geographic location, the lower the latency and the faster the response.
By implementing caching using a Global CDN, you can deliver content from locations closer to the user, providing a better user experience.
By applying caching in the Application layer to store the results of complex repetitive requests, you can reduce business logic calculations and database access.
Furthermore, by implementing a state management database to separate state storage from the Application server, you can improve service performance while avoiding session loss or concentration during horizontal scaling.
Generally, the speed and throughput of the entire service depend on database performance.
For services using relational databases, resources cannot be increased by scaling out horizontally, and because vertical scaling has limitations, significant effort is required for performance management.
Applying caching to the database significantly increases database throughput and reduces data retrieval latency.
Placing a Redis-based CacheStore (DBaaS) in front of the database or configuring a Replica of the Database service to distribute read load is also an effective strategy for improving performance.