The page has been translated by Gen AI.

Data Storage Design

Data Storage Design

Storage Selection

Storage is one of the key elements that affects application performance.

All software applications interact with storage for installation, logging, and file access.

The optimal storage solution may vary depending on the following factors.

Access MethodStorage Considerations
Access PatternSequential access, random access
Access FrequencyOnline (Hot), offline (Warm), archive (Cold)
Update FrequencyHigh update frequency (operating system, database volume), low update frequency (file storage, etc.)
Access AvailabilitySingle instance connection, shared connection
Table. Storage Selection Considerations

The storage options provided by Samsung Cloud Platform are as follows.

CategoryBlock StorageFile StorageObject StorageArchive Storage
FunctionData is stored in fixed-size blocks and directly assigned to servers for high-availability storage serviceProvides file-level storage for heterogeneous clients over the networkAllows users to store and use data on the internet as an object storage serviceStorage service suitable for long-term storage of large amounts of data
Access ConfigurationVM direct connection / Multi-AttachNFS / CIFSREST API (S3 compatible)Connected to Object Storage
Access Control-Public IP / Server / VPC EndpointPublic IP / Server / VPC EndpointProject public / private access setting
EncryptionKMS encryption volume selectionAES256 encryption applied by defaultAES256 encryption optionalAES256 encryption optional
Data ProtectionVM snapshotSnapshotVersion management-
DiskSSDSSD / HDD--
CapacityDefault OS: 16-12,288GB, additional volume: 8GB-12,288GB, up to 23 additional volumesNo limitNo limitNo limit
PurposeOperating system, database, and other high-throughput data storageWeb content management, entertainment data processing, container storage, big data analysisWeb content, log, and other object storageLong-term preservation of large amounts of data
Table. Samsung Cloud Platform Storage Options

Database Selection

In general, databases are used to standardize common platforms and increase management efficiency.

The appropriate database should be selected based on data requirements, and incorrect selection can lead to increased system latency and performance degradation.

Database selection varies depending on factors such as availability, scalability, data structure, throughput, and durability, which are required by the application.

When selecting a database, access patterns have a significant impact on technology selection, so it is desirable to optimize the database based on this.

Most databases provide configuration options for workload optimization, and operational aspects such as memory, cache, storage optimization, scalability, backup, recovery, and maintenance can be reviewed together.

In this document, we will explore various features to meet the database requirements of applications.

  • OLTP (Online Transaction Processing)

Most existing relational databases use online transaction processing (OLTP).

Samsung Cloud Platform provides managed database services for relational databases, including EPAS, MySQL, MariaDB, PostgreSQL, and Microsoft SQL Server.

Relational databases are suitable for applications that process complex business transactions, such as finance and e-commerce, and are advantageous for data aggregation and complex query processing.

Considerations for optimizing relational databases include:

  • Selecting server types, including computing, memory, storage, and networking
  • Configuring storage volumes
  • Selecting the appropriate database engine
  • Database options such as schema, index, and view

Relational databases can increase throughput through vertical scaling and can also scale horizontally for read operations using replicas.

  • OLAP (Online Analytical Processing)

For analyzing large amounts of structured data, a data warehouse platform can be used, and Samsung Cloud Platform provides a column-based high-performance MPP analysis environment through Vertica (DBaaS).

Recent data warehouse technologies adopt column formats and use MPP (Massive Parallel Processing) to improve data analysis speed.

Using column formats means that when aggregating data from only one column, it is not necessary to scan the entire table.

This reduces the amount of data scanned, resulting in improved query performance compared to row formats. MPP stores data in a distributed manner among lower nodes and performs queries on the leader node.

The leader node distributes queries to lower nodes based on the partition key.

Here, each node selects a portion of the query and performs parallel processing.

After that, the leader node collects query results from each lower node and returns the aggregated results.

Through this parallel processing method, query progress speeds up, and a large amount of data can be processed more quickly.

  • NoSQL

In various applications such as social media, the Internet of Things, clickstream data, and logs, a large amount of unstructured and semi-structured data is generated.

This data has a dynamic schema, and each record can have a different structure.

Storing this data in a relational database can be inefficient.

Relational databases must store data based on a fixed schema, so unnecessary null values may be stored, or data loss may occur.

Unstructured or NoSQL databases can store data flexibly without being bound by a fixed schema.

Each record can have a different number of columns and can be stored in the same table.

NoSQL databases can store large amounts of data and provide low latency.

Additionally, nodes can be easily added as needed, and horizontal expansion is supported by default.

However, since NoSQL databases do not support complex queries such as table and entity joins, using a relational database is more suitable in such cases.

On the Samsung Cloud Platform, CacheStore can be used as an in-memory database based on Redis, which can be used for high-performance database caching or application state storage.

  • Data Search

There are cases where a large amount of data needs to be searched quickly to solve problems or gain business insights.

Searching application data helps access detailed information and analyze it from various perspectives.

To search data with low latency and high throughput, search engine technology must be used.

The Samsung Cloud Platform provides a Search Engine service.

The Search Engine automates the creation and setup of ElasticSearch for data analysis.

The Search Engine can be deployed on a VM, and its availability and performance can be improved through cluster and replica configuration.

Database Performance Improvement

DB Optimization

Database performance improvement refers to designing and operating a database to maintain its performance for as long as possible, and the efficiency of the entire business, such as response speed or throughput per unit time, is more important than server performance management.

As the business continues to change, the number of concurrent users increases, and the amount of data continues to grow, database performance deteriorates.

Generally, database performance is defined as the response time to user requests.

Optimal database performance can be said to be achieving the best performance with the minimum resources.

Factors that deteriorate database performance can occur from the initial analysis and design stages to the development and operation stages.

StageOptimization SectionContent
AnalysisBusiness Process OptimizationRemove inefficient elements, perform process optimization that fits the business vision and strategy
AnalysisArchitectureSet the direction of the architecture considering transaction throughput, performance, data growth trend, security, and availability
DesignPhysical DesignPerform design considering response time, distributed DB environment, number of concurrent users, data size, parallel processing, and distribution, concentration, and redundancy
DesignApplication DesignDesign to achieve optimal performance in conjunction with the DB, access path, data request type, and index
DevelopmentSQLImprove developer skills and develop standards to comply with performance policies
OperationOS TuningPerform tuning for CPU, memory, disk I/O, etc.
OperationNetwork TuningPerform tuning according to the amount of data, files, etc. transferred
OperationDB TuningPerform tuning for data architecture, parameters, log files, etc.
OperationApplication TuningContinuously monitor the operating system and perform tuning by reflecting SQL, index policy, cluster policy, etc. for applications with poor performance
Table. Database Optimization Section by Project Stage

Caching Implementation

Caching is a process of temporarily storing data or files in an intermediate location between the client and permanent storage to process future requests more quickly and reduce network throughput.

Caching can improve application speed and reduce costs by reusing previously searched data. The following content shows the caching mechanism at each level.

LevelTargetCaching Implementation
Web LayerWeb ContentImprove web server content transmission delay → Use Global CDN for content transmission
Application LayerUser Session DataUse key/value storage and local cache to improve application performance and data access performance → Use CacheStore for state management
Database LayerDataUse database buffer and key/value storage to reduce latency when requesting database queries → Implement data caching using CacheStore, and offload read load using replica configuration
Table. Caching Implementation by Level

The performance efficiency of the web layer is mainly related to the transmission of static content such as images, videos, and HTML pages.

This static content can be provided from a location closer to the user, reducing latency and allowing for faster response.

Using Global CDN for caching allows content to be transmitted from a location closer to the user, providing a better user experience.

By applying caching to the application layer, the results of complex repeated requests can be stored, reducing business logic calculations and database access. Furthermore, implementing a state management database to separate state storage from the application server allows you to improve service performance while avoiding session loss or concentration when scaling servers horizontally.

In general, the speed and throughput of the entire service depend on the performance of the database.

For services that use relational databases, it is not possible to increase resources by scaling servers horizontally, and vertical scaling has limitations, so a lot of effort is required for performance management.

Applying caching to the database can greatly increase database throughput and reduce data search wait times.

Placing a Redis-based CacheStore in front of the database or configuring a replica of the Database service to distribute read loads is also an effective strategy for improving performance.