The page has been translated by Gen AI.

data Storage Design

data Storage Design

Select storage

Storage is one of the key factors that affect Application performance.

All software applications interact with storage for installation, logging, and file access.

The optimal storage solution can vary depending on the following factors.

Access methodStorage considerations
Access patternsequential access, random access
Access frequencyOnline (Hot), Offline (Warm), Archive (Cold)
Update frequencyHigh update frequency (operating system, database volume), low (file storage, etc.)
Access availabilitySingle-instance connection, shared connection
Table. Storage selection considerations

The storage options provided by Samsung Cloud Platform are as follows.

CategoryBlock StorageFile StorageObject StorageArchive Storage
functionData is stored in fixed-size blocks of a predefined array, and the high‑availability storage service is allocated directly on the server for use.File storage that provides data access to heterogeneous clients over the networkObject storage service built to enable users to store and use desired data on the InternetA storage service suitable for long-term retention of large-scale data
Access configurationDirect VM connection / Multi-AttachNFS/CIFSREST API
(S3 compatible)
Connect to Object Storage
Access control-Public IP / Server / VPC EndpointPublic IP / Server / VPC EndpointProject public / private access settings
EncryptionSelect KMS encrypted volumeBasic application of AES256 encryptionSelect AES256 encryptionSelect AES256 encryption
Data protectionVM snapshotsnapshotVersion control-
diskSSDSSD/HDD--
capacity
  • Base OS: 16GB ~ 12,288GB
  • Additional volume: 8GB ~ 12,288GB, up to 23 can be added
No limitNo limitNo limit
PurposeOperating systems, databases, and other high-throughput data storageWeb content management, storage for entertainment data processing, container storage, big data analysisStoring objects such as web content, logs, etc.Long-term preservation of large-scale data
Table. Samsung Cloud Platform storage options

Select Database

Generally, databases are used to standardize the common platform and improve management efficiency.

You must select an appropriate database based on data requirements, and an unsuitable choice can lead to increased system latency and performance degradation.

Choosing a database depends on the application’s requirements such as availability, scalability, data structure, throughput, durability, and so on.

Among the many factors to consider when choosing a database, access patterns have a significant impact on technology selection, so it is advisable to optimize the database based on them.

Most databases provide configuration options for workload optimization, and you can also review operational aspects such as scalability, backup, recovery, and maintenance, along with memory, cache, and storage optimization.

In this document, we will examine various features to meet the application’s database requirements.

OLTP(Online Transaction Processing)

Most traditional relational databases use the online transaction processing (OLTP) model.

Samsung Cloud Platform provides managed Database services for relational databases such as EPAS, MySQL, MariaDB, PostgreSQL, and Microsoft SQL Server.

Relational databases are suitable for applications that handle complex business transactions such as finance and e‑commerce, and they are advantageous for data aggregation and processing complex queries.

The considerations for relational database optimization are as follows.

Server type selection including computing, memory, storage, and networking

  • Storage volume configuration Select a database engine that fits your needs Database options such as schema, index, and view

Relational databases can increase throughput through vertical scaling, and horizontal scaling of read operations is also possible using replicas.

OLAP(Online Analytical Processing)

To analyze large-scale structured data, you can use a data warehouse platform, and on the Samsung Cloud Platform, you can implement a column-oriented high-performance MPP analytics environment through Vertica (DBaaS).

The latest data warehouse technologies adopt a columnar format and use MPP (Massive Parallel Processing) that helps improve data analysis speed.

When using column format, if you need to aggregate data from only a single column, you don’t need to scan the entire table.

As a result, the amount of data scanned is reduced compared to the row format, and query performance improves.

MPP stores data by distributing it among the lower nodes and executes queries on the leader node.

The leader node distributes queries to the subordinate nodes based on the partition key.

Here, each node selects a portion of the query and performs parallel processing.

After that, the leader node collects query results from each subordinate node and returns the aggregated result.

This parallel processing method speeds up query execution and enables larger volumes of data to be processed more quickly.

NoSQL

In various applications such as social media, the Internet of Things, clickstream data, and logs, large amounts of unstructured and semi-structured data are generated.

Such data has a dynamic schema, and each record can have a different structure.

Storing such data in a relational database can be inefficient.

Relational databases must store data based on a fixed schema, which can result in unnecessary null values being stored or data loss occurring.

Unstructured or NoSQL databases can store data flexibly without being constrained by a fixed schema.

Each record can have a different number of columns and can be stored in the same table.

NoSQL databases can store large volumes of data and provide low latency.

It also allows easy expansion by adding nodes when needed, and it natively supports horizontal scaling.

However, because NoSQL databases do not support complex queries such as table and entity joins, using a relational database is more appropriate in this case.

In Samsung Cloud Platform, you can use CacheStore as a Redis-based in-memory database, which can be used for high-performance database caching or for storing application state.

There are cases where you need to quickly search large amounts of data to promptly resolve issues or gain business insights.

Searching application data provides access to detailed information and helps analyze it from various perspectives.

To retrieve data with low latency and high throughput, search engine technology must be used.

Samsung Cloud Platform provides a Search Engine service.

Search Engine automates the creation and configuration of ElasticSearch for data analysis.

The Search Engine can be deployed on a VM, and its availability and performance can be improved through cluster and replica configurations.

Database Performance Improvement

DB Optimization

Database performance improvement means designing and operating the database to maintain its performance for as long as possible, and efficiency from a business-wide perspective—such as response time and throughput per unit time—is more important than managing the performance of the server alone.

Database performance degrades due to continuous business changes, increasing concurrent users, and ongoing data growth.

Generally, database performance is defined as the response time to a user’s request.

Optimal database performance can be defined as achieving maximum performance with minimal resources.

Factors that degrade database performance can arise from the initial analysis and design stages through development and pre‑operation phases.

stepOptimization sectioncontent
analysisBusiness Process OptimizationRemove inefficient elements, and perform
process optimization that aligns with the business vision and strategy.
analysisArchitectureTransaction throughput, performance, data growth trends, security, and availability
considered to set the direction of the architecture.
DesignPhysical designresponse time, distributed DB environment, concurrent user count,
data size, parallel processing and distribution, centralization, design for redundancy
DesignApplication designAccess path to achieve optimal performance when integrated with the DB,
data request pattern, consider indexes
developmentSQLEnable compliance with the
performance policy by improving developers’ capabilities and adhering to development standards.
OperationOS tuningPerform tuning for CPU, memory, disk I/O, etc.
OperationNetwork tuningPerform tuning based on the transmission volume of data, files, etc.
OperationDB tuningPerform
tuning on elements such as data architecture, parameters, and log files.
operationApplication TuningContinuously monitor the production system
SQL and index policies for performance‑degraded applications,
cluster policies, etc., and perform tuning.
Table. Database Optimization by Project Phase

Caching Implementation

Caching is a process that temporarily stores data or files at an intermediate location between the client and permanent storage to handle future requests more quickly and reduce network traffic.

Caching reuses previously retrieved data, which can increase application speed and reduce costs. The following content shows the caching mechanisms at each layer.

LayertargetCaching implementation
Web layerWeb contentImproving web server content delivery latency
→ Content delivery using a Global CDN
Application layerUser session dataImproving application performance and data access performance using a key/value store and local cache
→ State management using CacheStore
Database layerDataReduce latency when requesting database queries using database buffers and key/value stores
→ Implement data caching with CacheStore, offload read load through Replica configuration
Table. Hierarchical caching implementation

The performance efficiency of the web layer is primarily related to the delivery of static content such as images, videos, and HTML pages.

Static content delivered from a location geographically close to the user experiences reduced latency and faster response times.

By using a global CDN to implement caching, you can deliver content from locations close to the user, providing a better user experience.

By applying caching in the Application layer, you can store the results of complex repetitive requests, reducing business logic calculations and database accesses.

Additionally, by implementing a state-management database and separating state storage from the application server, you can improve service performance while avoiding session loss or concentration during horizontal scaling of the servers.

In general, the overall service speed and throughput are determined by the database performance.

For services that use relational databases, you cannot increase resources by horizontally scaling the server, and vertical scaling has limits, so performance management requires considerable effort.

Applying caching to the database can significantly increase database throughput and reduce data retrieval latency.

Placing a Redis-based CacheStore (DBaaS) in front of the database, or configuring a replica of the Database service to distribute read load, is also an effective strategy for improving performance.