Large-Scale Data Processing Architecture
Large-Scale Data Processing Architecture
Overview
In Samsung Cloud Platform, we provide a DW-type Database Service using Vertica(DBaaS), an analytics-dedicated database. The Vertica service allows you to easily build and manage Vertica clusters and provides a UI for managing cluster information and status.
Vertica is designed with a Masterless Pure-MPP (Massively Parallel Processing) architecture, making it suitable for fast parallel analysis of large-scale data, and includes In-DB Machine Learning and Advanced Analytics. It stores reference data as well as financial, health, monitoring, and event information, and enables users to quickly extract and analyze the information they need using a variety of analytical capabilities.
In the future, you can easily extract and analyze data anytime, anywhere by using various data stores such as Object Storage or HDFS.
Architecture Diagram
- The user uses various services (finance, hospital, events).
- To ensure service continuity, RDB is configured with HA, and NoSQL is set up as a cluster to store a variety of data that users desire.
- Using the CacheStore cache service, we quickly deliver content to users and store session information to reduce processing time.
- Using the messaging service Event Streams, data is sent to the appropriate Target repository in real time or batch mode.
- Using Elastic Stack to collect data from multiple systems (Logstash), analyze and search (Search Engine), and provide various information through the visualization feature (Kibana).
- Data stored in RDBs and NoSQL, etc., is transferred and stored in Vertica(DBaaS) via batch processing.
- Analyze by utilizing various data stored in Vertica(DBaaS).
- Data stored in Object Storage or HDFS can also be linked to Vertica(DBaaS) for data analysis.
Use Cases
Monitoring System Setup
When you need to periodically inspect dozens of servers and analyze anomalies, you collect and store various server information (server configuration data, system logs, software installation details, security information, etc.) by using a collection agent on each server.
You can select the problematic target by mapping it to the predefined anomaly patterns.
Data-Centric Hospital
We store the medical records (diagnosis, affected area, common symptoms, special notes, treatment status, etc.) of patients from multiple contracted hospitals in one place.
When a patient visits, a rapid diagnosis can be made based on symptoms, and the treatment status and methods can be provided to the patient.
Prerequisites
To deploy a Vertica service, customer license usage (BYOL) is required.
Constraints
Backup for Vertica (DBaaS) uses an initial full backup followed by incremental (Incremental Snapshot) backups, providing the ability to restore to a specific point in time (Snapshot) (not using the transaction log method).
Considerations
Vertica’s license imposes limits on data capacity, and a sudden increase in data during service use can become a problem. It is necessary to purchase a license after adequately estimating the amount of data you intend to store in advance.
Related service
This is a list of Samsung Cloud Platform services that are associated with the features or configurations described in this guide. Refer to it when selecting and designing services.
| service group | service | Detailed description |
|---|---|---|
| Database | PostgreSQL(DBaaS) | A service that easily creates and manages open-source PostgreSQL in a web environment |
| Database | MySQL(DBaaS) | A small yet powerful open-source relational database MySQL service that simplifies creation and management. |
| Database | CacheStore | Key-value in-memory data store with fast data processing capability |
| Storage | Object Storage | Object storage that simplifies data storage and retrieval |
| Data Analytics | Event Stream | Service that creates and manages Apache Kafka clusters |
| Data Analytics | Search Engine | A service that easily creates and manages Elasticsearch in a web environment |
