The page has been translated by Gen AI.

데이터 저장소 설계

데이터 저장소 설계

Storage Selection

Storage is one of the key factors impacting Application performance.

All software applications interact with storage for installation, logging, and file access.

The optimal storage solution may vary depending on the following factors.

Access methodStorage considerations
Access patternsequential access, random access
Access frequencyOnline (Hot), Offline (Warm), Archive (Cold)
Update frequencyHigh update frequency (operating system, database volume), low (file storage, etc.)
Access availabilitySingle-instance connection, shared connection
표. 스토리지 선택 고려 요소

The storage options provided by Samsung Cloud Platform are as follows.

CategoryBlock StorageFile StorageObject StorageArchive Storage
functionData is stored in fixed-size blocks of a predefined array, and the high‑availability storage service is allocated directly on the server for use.File storage that provides data access to heterogeneous clients over the networkObject storage service built to enable users to store and use desired data on the InternetStorage service suitable for long-term retention of large-scale data
Access configurationDirect VM connection / Multi-AttachNFS/CIFSREST API
(S3 compatible)
Connect to Object Storage
Access controlPublic IP / Server / VPC EndpointPublic IP / Server / VPC EndpointProject Public / Private Access Settings
EncryptionSelect KMS encrypted volumeBasic AES256 encryption appliedSelect AES256 encryptionSelect AES256 encryption
Data protectionVM snapshotsnapshotVersion control-
diskSSDSSD/HDD-
capacity
  • Base OS: 16GB ~ 12,288GB
  • Additional volume: 8GB ~ 12,288GB, up to 23 can be added
No limitNo limitNo limit
PurposeOperating systems, databases, and other high‑throughput data storageWeb content management, storage for entertainment data processing, container storage, big data analyticsStoring objects such as web content, logs, etc.Long-term preservation of large-scale data
표. Samsung Cloud Platform의 스토리지 옵션

Database Selection

Generally, databases are used to standardize a common platform and improve management efficiency.

You must select an appropriate database based on data requirements, as an improper selection can lead to increased system latency and performance degradation.

Database selection depends on the application’s requirements, such as availability, scalability, data structure, throughput, and durability.

Among the various factors to consider when selecting a database, access patterns significantly influence the choice of technology, so it is advisable to optimize the database based on these patterns.

Most databases provide configuration options for workload optimization, and you can review operational aspects such as scalability, backup, recovery, and maintenance, along with memory, cache, and storage optimization.

In this document, we will explore various features to meet the Application’s database requirements.

OLTP(Online Transaction Processing)

Traditional relational databases mostly use Online Transaction Processing (OLTP).

Samsung Cloud Platform provides EPAS, MySQL, MariaDB, PostgreSQL, and Microsoft SQL Server as managed Database services.

Relational databases are suitable for applications handling complex business transactions, such as finance and e-commerce, and are advantageous for data aggregation and complex query processing.

Considerations for relational database optimization are as follows.

Selection of server type, including computing, memory, storage, and networking

  • Storage volume configuration
  • Select the database engine that suits your needs
  • Database options such as Schema, Index, and View

Relational databases can increase throughput through vertical scaling, and horizontal scaling of read operations is also possible via replicas.

OLAP(Online Analytical Processing)

You can utilize a data warehouse platform to analyze large-scale structured data, and on Samsung Cloud Platform, you can implement a column-based, high-performance MPP analysis environment through Vertica (DBaaS).

Modern data warehouse technologies adopt a columnar format and use MPP (Massive Parallel Processing), which helps improve data analysis speed.

When using columnar format, if you need to aggregate data from only one column, you do not need to scan the entire table.

As a result, less data is scanned compared to the row format, and query performance improves.

MPP stores data by distributing it across child nodes and executes queries on the leader node.

The leader node distributes queries to the subordinate nodes based on the partition key.

Here, each node selects a part of the query and performs parallel processing.

Then, the leader node collects query results from each child node and returns the aggregated results.

This parallel processing method accelerates query execution and enables faster processing of large volumes of data.

NoSQL

Various applications, such as social media, the Internet of Things, clickstream data, and logs, generate large amounts of unstructured and semi-structured data.

This data has a dynamic schema, and each record can have a different structure.

Storing this data in a relational database can be inefficient.

Since relational databases must store data based on a fixed schema, unnecessary null values may be stored or data loss may occur.

Unstructured or NoSQL databases can store data flexibly without being constrained by a fixed schema.

Each record can have a different number of columns and can be stored in the same table.

NoSQL databases support large-scale data storage and provide low latency.

Additionally, it supports horizontal scaling by default and can be easily scaled by adding nodes as needed.

However, since NoSQL databases do not support complex queries such as table and entity joins, using a relational database is more suitable in this case.

On Samsung Cloud Platform, you can use CacheStore, a Redis-based in-memory database, for high-performance database caching or application state storage.

Retrieving Data

There are times when you need to quickly search large volumes of data to rapidly resolve issues or gain business insights.

Searching for Application data helps you access detailed information and analyze it from various perspectives.

To retrieve data with low latency and high throughput, search engine technology must be used.

Samsung Cloud Platform provides the Search Engine service.

Search Engine provides automated creation and configuration of ElasticSearch for data analysis.

The Search Engine can be deployed on a VM, and its availability and performance can be improved through cluster and replica configurations.

Database Performance Improvement

DB Optimization

Database performance improvement refers to designing and operating a database to maintain its performance for as long as possible, prioritizing overall business efficiency—such as response speed and throughput—over standalone server performance management.

Database performance degrades due to continuous business changes, increasing concurrent users, and continuous data growth.

Generally, database performance is defined by the response time to user requests.

Optimal database performance can be defined as achieving maximum performance with minimal resources.

Factors that degrade database performance can arise from the initial analysis and design phases through to the development and pre-operation stages.

stepOptimization sectionContent
AnalysisBusiness Process OptimizationEliminate inefficient elements, and perform process optimization that aligns with the business vision and strategy
AnalysisarchitectureConsidering transaction throughput, performance, data growth trends, security, and availability,
set the architectural direction.
DesignPhysical designResponse time, distributed DB environment, concurrent user count,
data size, parallel processing and distribution, centralization, design for redundancy
DesignApplication DesignAccess paths,
data request forms, and index considerations for optimal performance when integrated with the DB.
developmentSQLBy improving developers’ skills and ensuring compliance with development standards, it enables adherence to performance policies.
OperationOS tuningPerform tuning for CPU, memory, disk I/O, and related components.
OperationNetwork tuningPerform tuning based on the volume of data, files, and other transfers.
OperationDB tuningPerform tuning of elements such as data architecture, parameters, and log files
OperationApplication TuningContinuously monitor the production system
and perform tuning by applying SQL and index policies for performance‑degrading applications,
as well as cluster policies.
표. 프로젝트 단계별 데이터베이스 최적화 부문

Caching Implementation

Caching is the process of temporarily storing data or files in an intermediate location between the client and persistent storage to process future requests faster and reduce network throughput.

Caching improves application performance and reduces costs by reusing previously retrieved data. The following content illustrates the caching mechanisms at each layer.

layerTargetCaching implementation
Web layerWeb contentImproving web content delivery latency of the web server
→ Content delivery using a Global CDN
Application layerUser session dataImprove application performance and data access performance using a key/value store and local cache
→ State management using CacheStore
Database layerDataReduced latency when requesting database queries using a database buffer and key/value store
→ Implemented data caching with CacheStore and offloaded read load by configuring replicas
표. 계층별 캐싱 구현

Performance efficiency of the web layer primarily relates to the delivery of static content such as images, videos, and HTML pages.

The closer this static content is served to the user’s geographic location, the lower the latency and the faster the response.

By implementing caching using a Global CDN, you can deliver content from locations closer to the user, providing a better user experience.

By applying caching in the Application layer to store the results of complex repetitive requests, you can reduce business logic calculations and database access.

Furthermore, by implementing a state management database to separate state storage from the Application server, you can improve service performance while avoiding session loss or concentration during horizontal scaling.

Generally, the speed and throughput of the entire service depend on database performance.

For services using relational databases, resources cannot be increased by scaling out horizontally, and because vertical scaling has limitations, significant effort is required for performance management.

Applying caching to the database significantly increases database throughput and reduces data retrieval latency.

Placing a Redis-based CacheStore (DBaaS) in front of the database or configuring a Replica of the Database service to distribute read load is also an effective strategy for improving performance.