The page has been translated by Gen AI.

data Storage Design

Select storage

Storage is one of the key factors that affect Application performance.

All software applications interact with storage for installation, logging, and file access.

The optimal storage solution can vary depending on the following factors.

Access method	Storage considerations
Access pattern	sequential access, random access
Access frequency	Online (Hot), Offline (Warm), Archive (Cold)
Update frequency	High update frequency (operating system, database volume), low (file storage, etc.)
Access availability	Single-instance connection, shared connection

Table. Storage selection considerations

The storage options provided by Samsung Cloud Platform are as follows.

Category	Block Storage	File Storage	Object Storage	Archive Storage
function	Data is stored in fixed-size blocks of a predefined array, and the high‑availability storage service is allocated directly on the server for use.	File storage that provides data access to heterogeneous clients over the network	Object storage service built to enable users to store and use desired data on the Internet	A storage service suitable for long-term retention of large-scale data
Access configuration	Direct VM connection / Multi-Attach	NFS/CIFS	REST API (S3 compatible)	Connect to Object Storage
Access control	-	Public IP / Server / VPC Endpoint	Public IP / Server / VPC Endpoint	Project public / private access settings
Encryption	Select KMS encrypted volume	Basic application of AES256 encryption	Select AES256 encryption	Select AES256 encryption
Data protection	VM snapshot	snapshot	Version control	-
disk	SSD	SSD/HDD	-	-
capacity	Base OS: 16GB ~ 12,288GB Additional volume: 8GB ~ 12,288GB, up to 23 can be added	No limit	No limit	No limit
Purpose	Operating systems, databases, and other high-throughput data storage	Web content management, storage for entertainment data processing, container storage, big data analysis	Storing objects such as web content, logs, etc.	Long-term preservation of large-scale data

Table. Samsung Cloud Platform storage options

Select Database

Generally, databases are used to standardize the common platform and improve management efficiency.

You must select an appropriate database based on data requirements, and an unsuitable choice can lead to increased system latency and performance degradation.

Choosing a database depends on the application’s requirements such as availability, scalability, data structure, throughput, durability, and so on.

Among the many factors to consider when choosing a database, access patterns have a significant impact on technology selection, so it is advisable to optimize the database based on them.

Most databases provide configuration options for workload optimization, and you can also review operational aspects such as scalability, backup, recovery, and maintenance, along with memory, cache, and storage optimization.

In this document, we will examine various features to meet the application’s database requirements.

OLTP(Online Transaction Processing)

Most traditional relational databases use the online transaction processing (OLTP) model.

Samsung Cloud Platform provides managed Database services for relational databases such as EPAS, MySQL, MariaDB, PostgreSQL, and Microsoft SQL Server.

Relational databases are suitable for applications that handle complex business transactions such as finance and e‑commerce, and they are advantageous for data aggregation and processing complex queries.

The considerations for relational database optimization are as follows.

Server type selection including computing, memory, storage, and networking

Storage volume configuration Select a database engine that fits your needs Database options such as schema, index, and view

Relational databases can increase throughput through vertical scaling, and horizontal scaling of read operations is also possible using replicas.

OLAP(Online Analytical Processing)

To analyze large-scale structured data, you can use a data warehouse platform, and on the Samsung Cloud Platform, you can implement a column-oriented high-performance MPP analytics environment through Vertica (DBaaS).

The latest data warehouse technologies adopt a columnar format and use MPP (Massive Parallel Processing) that helps improve data analysis speed.

When using column format, if you need to aggregate data from only a single column, you don’t need to scan the entire table.

As a result, the amount of data scanned is reduced compared to the row format, and query performance improves.

MPP stores data by distributing it among the lower nodes and executes queries on the leader node.

The leader node distributes queries to the subordinate nodes based on the partition key.

Here, each node selects a portion of the query and performs parallel processing.

After that, the leader node collects query results from each subordinate node and returns the aggregated result.

This parallel processing method speeds up query execution and enables larger volumes of data to be processed more quickly.

NoSQL

In various applications such as social media, the Internet of Things, clickstream data, and logs, large amounts of unstructured and semi-structured data are generated.

Such data has a dynamic schema, and each record can have a different structure.

Storing such data in a relational database can be inefficient.

Relational databases must store data based on a fixed schema, which can result in unnecessary null values being stored or data loss occurring.

Unstructured or NoSQL databases can store data flexibly without being constrained by a fixed schema.

Each record can have a different number of columns and can be stored in the same table.

NoSQL databases can store large volumes of data and provide low latency.

It also allows easy expansion by adding nodes when needed, and it natively supports horizontal scaling.

However, because NoSQL databases do not support complex queries such as table and entity joins, using a relational database is more appropriate in this case.

In Samsung Cloud Platform, you can use CacheStore as a Redis-based in-memory database, which can be used for high-performance database caching or for storing application state.

Data Search

There are cases where you need to quickly search large amounts of data to promptly resolve issues or gain business insights.

Searching application data provides access to detailed information and helps analyze it from various perspectives.

To retrieve data with low latency and high throughput, search engine technology must be used.

Samsung Cloud Platform provides a Search Engine service.

Search Engine automates the creation and configuration of ElasticSearch for data analysis.

The Search Engine can be deployed on a VM, and its availability and performance can be improved through cluster and replica configurations.

Database Performance Improvement

DB Optimization

Database performance improvement means designing and operating the database to maintain its performance for as long as possible, and efficiency from a business-wide perspective—such as response time and throughput per unit time—is more important than managing the performance of the server alone.

Database performance degrades due to continuous business changes, increasing concurrent users, and ongoing data growth.

Generally, database performance is defined as the response time to a user’s request.

Optimal database performance can be defined as achieving maximum performance with minimal resources.

Factors that degrade database performance can arise from the initial analysis and design stages through development and pre‑operation phases.

step	Optimization section	content
analysis	Business Process Optimization	Remove inefficient elements, and perform process optimization that aligns with the business vision and strategy.
analysis	Architecture	Transaction throughput, performance, data growth trends, security, and availability considered to set the direction of the architecture.
Design	Physical design	response time, distributed DB environment, concurrent user count, data size, parallel processing and distribution, centralization, design for redundancy
Design	Application design	Access path to achieve optimal performance when integrated with the DB, data request pattern, consider indexes
development	SQL	Enable compliance with the performance policy by improving developers’ capabilities and adhering to development standards.
Operation	OS tuning	Perform tuning for CPU, memory, disk I/O, etc.
Operation	Network tuning	Perform tuning based on the transmission volume of data, files, etc.
Operation	DB tuning	Perform tuning on elements such as data architecture, parameters, and log files.
operation	Application Tuning	Continuously monitor the production system SQL and index policies for performance‑degraded applications, cluster policies, etc., and perform tuning.

Table. Database Optimization by Project Phase

Caching Implementation

Caching is a process that temporarily stores data or files at an intermediate location between the client and permanent storage to handle future requests more quickly and reduce network traffic.

Caching reuses previously retrieved data, which can increase application speed and reduce costs. The following content shows the caching mechanisms at each layer.

Layer	target	Caching implementation
Web layer	Web content	Improving web server content delivery latency → Content delivery using a Global CDN
Application layer	User session data	Improving application performance and data access performance using a key/value store and local cache → State management using CacheStore
Database layer	Data	Reduce latency when requesting database queries using database buffers and key/value stores → Implement data caching with CacheStore, offload read load through Replica configuration

Table. Hierarchical caching implementation

The performance efficiency of the web layer is primarily related to the delivery of static content such as images, videos, and HTML pages.

Static content delivered from a location geographically close to the user experiences reduced latency and faster response times.

By using a global CDN to implement caching, you can deliver content from locations close to the user, providing a better user experience.

By applying caching in the Application layer, you can store the results of complex repetitive requests, reducing business logic calculations and database accesses.

Additionally, by implementing a state-management database and separating state storage from the application server, you can improve service performance while avoiding session loss or concentration during horizontal scaling of the servers.

In general, the overall service speed and throughput are determined by the database performance.

For services that use relational databases, you cannot increase resources by horizontally scaling the server, and vertical scaling has limits, so performance management requires considerable effort.

Applying caching to the database can significantly increase database throughput and reduce data retrieval latency.

Placing a Redis-based CacheStore (DBaaS) in front of the database, or configuring a replica of the Database service to distribute read load, is also an effective strategy for improving performance.