data Storage Design
data Storage Design
Select storage
Storage is one of the key factors that affect Application performance.
All software applications interact with storage for installation, logging, and file access.
The optimal storage solution can vary depending on the following factors.
| Access method | Storage considerations |
|---|---|
| Access pattern | sequential access, random access |
| Access frequency | Online (Hot), Offline (Warm), Archive (Cold) |
| Update frequency | High update frequency (operating system, database volume), low (file storage, etc.) |
| Access availability | Single-instance connection, shared connection |
The storage options provided by Samsung Cloud Platform are as follows.
| Category | Block Storage | File Storage | Object Storage | Archive Storage |
|---|---|---|---|---|
| function | Data is stored in fixed-size blocks of a predefined array, and the high‑availability storage service is allocated directly on the server for use. | File storage that provides data access to heterogeneous clients over the network | Object storage service built to enable users to store and use desired data on the Internet | A storage service suitable for long-term retention of large-scale data |
| Access configuration | Direct VM connection / Multi-Attach | NFS/CIFS | REST API (S3 compatible) | Connect to Object Storage |
| Access control | - | Public IP / Server / VPC Endpoint | Public IP / Server / VPC Endpoint | Project public / private access settings |
| Encryption | Select KMS encrypted volume | Basic application of AES256 encryption | Select AES256 encryption | Select AES256 encryption |
| Data protection | VM snapshot | snapshot | Version control | - |
| disk | SSD | SSD/HDD | - | - |
| capacity |
| No limit | No limit | No limit |
| Purpose | Operating systems, databases, and other high-throughput data storage | Web content management, storage for entertainment data processing, container storage, big data analysis | Storing objects such as web content, logs, etc. | Long-term preservation of large-scale data |
Select Database
Generally, databases are used to standardize the common platform and improve management efficiency.
You must select an appropriate database based on data requirements, and an unsuitable choice can lead to increased system latency and performance degradation.
Choosing a database depends on the application’s requirements such as availability, scalability, data structure, throughput, durability, and so on.
Among the many factors to consider when choosing a database, access patterns have a significant impact on technology selection, so it is advisable to optimize the database based on them.
Most databases provide configuration options for workload optimization, and you can also review operational aspects such as scalability, backup, recovery, and maintenance, along with memory, cache, and storage optimization.
In this document, we will examine various features to meet the application’s database requirements.
OLTP(Online Transaction Processing)
Most traditional relational databases use the online transaction processing (OLTP) model.
Samsung Cloud Platform provides managed Database services for relational databases such as EPAS, MySQL, MariaDB, PostgreSQL, and Microsoft SQL Server.
Relational databases are suitable for applications that handle complex business transactions such as finance and e‑commerce, and they are advantageous for data aggregation and processing complex queries.
The considerations for relational database optimization are as follows.
Server type selection including computing, memory, storage, and networking
- Storage volume configuration Select a database engine that fits your needs Database options such as schema, index, and view
Relational databases can increase throughput through vertical scaling, and horizontal scaling of read operations is also possible using replicas.
OLAP(Online Analytical Processing)
To analyze large-scale structured data, you can use a data warehouse platform, and on the Samsung Cloud Platform, you can implement a column-oriented high-performance MPP analytics environment through Vertica (DBaaS).
The latest data warehouse technologies adopt a columnar format and use MPP (Massive Parallel Processing) that helps improve data analysis speed.
When using column format, if you need to aggregate data from only a single column, you don’t need to scan the entire table.
As a result, the amount of data scanned is reduced compared to the row format, and query performance improves.
MPP stores data by distributing it among the lower nodes and executes queries on the leader node.
The leader node distributes queries to the subordinate nodes based on the partition key.
Here, each node selects a portion of the query and performs parallel processing.
After that, the leader node collects query results from each subordinate node and returns the aggregated result.
This parallel processing method speeds up query execution and enables larger volumes of data to be processed more quickly.
NoSQL
In various applications such as social media, the Internet of Things, clickstream data, and logs, large amounts of unstructured and semi-structured data are generated.
Such data has a dynamic schema, and each record can have a different structure.
Storing such data in a relational database can be inefficient.
Relational databases must store data based on a fixed schema, which can result in unnecessary null values being stored or data loss occurring.
Unstructured or NoSQL databases can store data flexibly without being constrained by a fixed schema.
Each record can have a different number of columns and can be stored in the same table.
NoSQL databases can store large volumes of data and provide low latency.
It also allows easy expansion by adding nodes when needed, and it natively supports horizontal scaling.
However, because NoSQL databases do not support complex queries such as table and entity joins, using a relational database is more appropriate in this case.
In Samsung Cloud Platform, you can use CacheStore as a Redis-based in-memory database, which can be used for high-performance database caching or for storing application state.
Data Search
There are cases where you need to quickly search large amounts of data to promptly resolve issues or gain business insights.
Searching application data provides access to detailed information and helps analyze it from various perspectives.
To retrieve data with low latency and high throughput, search engine technology must be used.
Samsung Cloud Platform provides a Search Engine service.
Search Engine automates the creation and configuration of ElasticSearch for data analysis.
The Search Engine can be deployed on a VM, and its availability and performance can be improved through cluster and replica configurations.
Database Performance Improvement
DB Optimization
Database performance improvement means designing and operating the database to maintain its performance for as long as possible, and efficiency from a business-wide perspective—such as response time and throughput per unit time—is more important than managing the performance of the server alone.
Database performance degrades due to continuous business changes, increasing concurrent users, and ongoing data growth.
Generally, database performance is defined as the response time to a user’s request.
Optimal database performance can be defined as achieving maximum performance with minimal resources.
Factors that degrade database performance can arise from the initial analysis and design stages through development and pre‑operation phases.
| step | Optimization section | content |
|---|---|---|
| analysis | Business Process Optimization | Remove inefficient elements, and perform process optimization that aligns with the business vision and strategy. |
| analysis | Architecture | Transaction throughput, performance, data growth trends, security, and availability considered to set the direction of the architecture. |
| Design | Physical design | response time, distributed DB environment, concurrent user count, data size, parallel processing and distribution, centralization, design for redundancy |
| Design | Application design | Access path to achieve optimal performance when integrated with the DB, data request pattern, consider indexes |
| development | SQL | Enable compliance with the performance policy by improving developers’ capabilities and adhering to development standards. |
| Operation | OS tuning | Perform tuning for CPU, memory, disk I/O, etc. |
| Operation | Network tuning | Perform tuning based on the transmission volume of data, files, etc. |
| Operation | DB tuning | Perform tuning on elements such as data architecture, parameters, and log files. |
| operation | Application Tuning | Continuously monitor the production system SQL and index policies for performance‑degraded applications, cluster policies, etc., and perform tuning. |
Caching Implementation
Caching is a process that temporarily stores data or files at an intermediate location between the client and permanent storage to handle future requests more quickly and reduce network traffic.
Caching reuses previously retrieved data, which can increase application speed and reduce costs. The following content shows the caching mechanisms at each layer.
| Layer | target | Caching implementation |
|---|---|---|
| Web layer | Web content | Improving web server content delivery latency → Content delivery using a Global CDN |
| Application layer | User session data | Improving application performance and data access performance using a key/value store and local cache → State management using CacheStore |
| Database layer | Data | Reduce latency when requesting database queries using database buffers and key/value stores → Implement data caching with CacheStore, offload read load through Replica configuration |
The performance efficiency of the web layer is primarily related to the delivery of static content such as images, videos, and HTML pages.
Static content delivered from a location geographically close to the user experiences reduced latency and faster response times.
By using a global CDN to implement caching, you can deliver content from locations close to the user, providing a better user experience.
By applying caching in the Application layer, you can store the results of complex repetitive requests, reducing business logic calculations and database accesses.
Additionally, by implementing a state-management database and separating state storage from the application server, you can improve service performance while avoiding session loss or concentration during horizontal scaling of the servers.
In general, the overall service speed and throughput are determined by the database performance.
For services that use relational databases, you cannot increase resources by horizontally scaling the server, and vertical scaling has limits, so performance management requires considerable effort.
Applying caching to the database can significantly increase database throughput and reduce data retrieval latency.
Placing a Redis-based CacheStore (DBaaS) in front of the database, or configuring a replica of the Database service to distribute read load, is also an effective strategy for improving performance.