The page has been translated by Gen AI.

Data Storage Design

Storage Selection

Storage is one of the key elements that affects application performance.

All software applications interact with storage for installation, logging, and file access.

The optimal storage solution may vary depending on the following factors.

Access Method	Storage Considerations
Access Pattern	Sequential access, random access
Access Frequency	Online (Hot), offline (Warm), archive (Cold)
Update Frequency	High update frequency (operating system, database volume), low update frequency (file storage, etc.)
Access Availability	Single instance connection, shared connection

Table. Storage Selection Considerations

The storage options provided by Samsung Cloud Platform are as follows.

Category	Block Storage	File Storage	Object Storage	Archive Storage
Function	Data is stored in fixed-size blocks and directly assigned to servers for high-availability storage service	Provides file-level storage for heterogeneous clients over the network	Allows users to store and use data on the internet as an object storage service	Storage service suitable for long-term storage of large amounts of data
Access Configuration	VM direct connection / Multi-Attach	NFS / CIFS	REST API (S3 compatible)	Connected to Object Storage
Access Control	-	Public IP / Server / VPC Endpoint	Public IP / Server / VPC Endpoint	Project public / private access setting
Encryption	KMS encryption volume selection	AES256 encryption applied by default	AES256 encryption optional	AES256 encryption optional
Data Protection	VM snapshot	Snapshot	Version management	-
Disk	SSD	SSD / HDD	-	-
Capacity	Default OS: 16-12,288GB, additional volume: 8GB-12,288GB, up to 23 additional volumes	No limit	No limit	No limit
Purpose	Operating system, database, and other high-throughput data storage	Web content management, entertainment data processing, container storage, big data analysis	Web content, log, and other object storage	Long-term preservation of large amounts of data

Table. Samsung Cloud Platform Storage Options

Database Selection

In general, databases are used to standardize common platforms and increase management efficiency.

The appropriate database should be selected based on data requirements, and incorrect selection can lead to increased system latency and performance degradation.

Database selection varies depending on factors such as availability, scalability, data structure, throughput, and durability, which are required by the application.

When selecting a database, access patterns have a significant impact on technology selection, so it is desirable to optimize the database based on this.

Most databases provide configuration options for workload optimization, and operational aspects such as memory, cache, storage optimization, scalability, backup, recovery, and maintenance can be reviewed together.

In this document, we will explore various features to meet the database requirements of applications.

OLTP (Online Transaction Processing)

Most existing relational databases use online transaction processing (OLTP).

Samsung Cloud Platform provides managed database services for relational databases, including EPAS, MySQL, MariaDB, PostgreSQL, and Microsoft SQL Server.

Relational databases are suitable for applications that process complex business transactions, such as finance and e-commerce, and are advantageous for data aggregation and complex query processing.

Considerations for optimizing relational databases include:

Selecting server types, including computing, memory, storage, and networking
Configuring storage volumes
Selecting the appropriate database engine
Database options such as schema, index, and view

Relational databases can increase throughput through vertical scaling and can also scale horizontally for read operations using replicas.

OLAP (Online Analytical Processing)

For analyzing large amounts of structured data, a data warehouse platform can be used, and Samsung Cloud Platform provides a column-based high-performance MPP analysis environment through Vertica (DBaaS).

Recent data warehouse technologies adopt column formats and use MPP (Massive Parallel Processing) to improve data analysis speed.

Using column formats means that when aggregating data from only one column, it is not necessary to scan the entire table.

This reduces the amount of data scanned, resulting in improved query performance compared to row formats. MPP stores data in a distributed manner among lower nodes and performs queries on the leader node.

The leader node distributes queries to lower nodes based on the partition key.

Here, each node selects a portion of the query and performs parallel processing.

After that, the leader node collects query results from each lower node and returns the aggregated results.

Through this parallel processing method, query progress speeds up, and a large amount of data can be processed more quickly.

NoSQL

In various applications such as social media, the Internet of Things, clickstream data, and logs, a large amount of unstructured and semi-structured data is generated.

This data has a dynamic schema, and each record can have a different structure.

Storing this data in a relational database can be inefficient.

Relational databases must store data based on a fixed schema, so unnecessary null values may be stored, or data loss may occur.

Unstructured or NoSQL databases can store data flexibly without being bound by a fixed schema.

Each record can have a different number of columns and can be stored in the same table.

NoSQL databases can store large amounts of data and provide low latency.

Additionally, nodes can be easily added as needed, and horizontal expansion is supported by default.

However, since NoSQL databases do not support complex queries such as table and entity joins, using a relational database is more suitable in such cases.

On the Samsung Cloud Platform, CacheStore can be used as an in-memory database based on Redis, which can be used for high-performance database caching or application state storage.

Data Search

There are cases where a large amount of data needs to be searched quickly to solve problems or gain business insights.

Searching application data helps access detailed information and analyze it from various perspectives.

To search data with low latency and high throughput, search engine technology must be used.

The Samsung Cloud Platform provides a Search Engine service.

The Search Engine automates the creation and setup of ElasticSearch for data analysis.

The Search Engine can be deployed on a VM, and its availability and performance can be improved through cluster and replica configuration.

Database Performance Improvement

DB Optimization

Database performance improvement refers to designing and operating a database to maintain its performance for as long as possible, and the efficiency of the entire business, such as response speed or throughput per unit time, is more important than server performance management.

As the business continues to change, the number of concurrent users increases, and the amount of data continues to grow, database performance deteriorates.

Generally, database performance is defined as the response time to user requests.

Optimal database performance can be said to be achieving the best performance with the minimum resources.

Factors that deteriorate database performance can occur from the initial analysis and design stages to the development and operation stages.

Stage	Optimization Section	Content
Analysis	Business Process Optimization	Remove inefficient elements, perform process optimization that fits the business vision and strategy
Analysis	Architecture	Set the direction of the architecture considering transaction throughput, performance, data growth trend, security, and availability
Design	Physical Design	Perform design considering response time, distributed DB environment, number of concurrent users, data size, parallel processing, and distribution, concentration, and redundancy
Design	Application Design	Design to achieve optimal performance in conjunction with the DB, access path, data request type, and index
Development	SQL	Improve developer skills and develop standards to comply with performance policies
Operation	OS Tuning	Perform tuning for CPU, memory, disk I/O, etc.
Operation	Network Tuning	Perform tuning according to the amount of data, files, etc. transferred
Operation	DB Tuning	Perform tuning for data architecture, parameters, log files, etc.
Operation	Application Tuning	Continuously monitor the operating system and perform tuning by reflecting SQL, index policy, cluster policy, etc. for applications with poor performance

Table. Database Optimization Section by Project Stage

Caching Implementation

Caching is a process of temporarily storing data or files in an intermediate location between the client and permanent storage to process future requests more quickly and reduce network throughput.

Caching can improve application speed and reduce costs by reusing previously searched data. The following content shows the caching mechanism at each level.

Level	Target	Caching Implementation
Web Layer	Web Content	Improve web server content transmission delay → Use Global CDN for content transmission
Application Layer	User Session Data	Use key/value storage and local cache to improve application performance and data access performance → Use CacheStore for state management
Database Layer	Data	Use database buffer and key/value storage to reduce latency when requesting database queries → Implement data caching using CacheStore, and offload read load using replica configuration

Table. Caching Implementation by Level

The performance efficiency of the web layer is mainly related to the transmission of static content such as images, videos, and HTML pages.

This static content can be provided from a location closer to the user, reducing latency and allowing for faster response.

Using Global CDN for caching allows content to be transmitted from a location closer to the user, providing a better user experience.

By applying caching to the application layer, the results of complex repeated requests can be stored, reducing business logic calculations and database access. Furthermore, implementing a state management database to separate state storage from the application server allows you to improve service performance while avoiding session loss or concentration when scaling servers horizontally.

In general, the speed and throughput of the entire service depend on the performance of the database.

For services that use relational databases, it is not possible to increase resources by scaling servers horizontally, and vertical scaling has limitations, so a lot of effort is required for performance management.

Applying caching to the database can greatly increase database throughput and reduce data search wait times.

Placing a Redis-based CacheStore in front of the database or configuring a replica of the Database service to distribute read loads is also an effective strategy for improving performance.