The page has been translated by Gen AI.

Large-Scale Data Processing Architecture

Large-Scale Data Processing Architecture

Overview

In Samsung Cloud Platform, we provide a DW-type Database Service using Vertica(DBaaS), an analytics-dedicated database. The Vertica service allows you to easily build and manage Vertica clusters and provides a UI for managing cluster information and status.

Vertica is designed with a Masterless Pure-MPP (Massively Parallel Processing) architecture, making it suitable for fast parallel analysis of large-scale data, and includes In-DB Machine Learning and Advanced Analytics. It stores reference data as well as financial, health, monitoring, and event information, and enables users to quickly extract and analyze the information they need using a variety of analytical capabilities.

In the future, you can easily extract and analyze data anytime, anywhere by using various data stores such as Object Storage or HDFS.

Architecture Diagram

Diagram
Figure 1. Large-Scale Data Processing Architecture
  1. The user uses various services (finance, hospital, events).
  2. To ensure service continuity, RDB is configured with HA, and NoSQL is set up as a cluster to store a variety of data that users desire.
  3. Using the CacheStore cache service, we quickly deliver content to users and store session information to reduce processing time.
  4. Using the messaging service Event Streams, data is sent to the appropriate Target repository in real time or batch mode.
  5. Using Elastic Stack to collect data from multiple systems (Logstash), analyze and search (Search Engine), and provide various information through the visualization feature (Kibana).
  6. Data stored in RDBs and NoSQL, etc., is transferred and stored in Vertica(DBaaS) via batch processing.
  7. Analyze by utilizing various data stored in Vertica(DBaaS).
  8. Data stored in Object Storage or HDFS can also be linked to Vertica(DBaaS) for data analysis.

Use Cases

Monitoring System Setup

When you need to periodically inspect dozens of servers and analyze anomalies, you collect and store various server information (server configuration data, system logs, software installation details, security information, etc.) by using a collection agent on each server.

You can select the problematic target by mapping it to the predefined anomaly patterns.

Data-Centric Hospital

We store the medical records (diagnosis, affected area, common symptoms, special notes, treatment status, etc.) of patients from multiple contracted hospitals in one place.

When a patient visits, a rapid diagnosis can be made based on symptoms, and the treatment status and methods can be provided to the patient.

Prerequisites

To deploy a Vertica service, customer license usage (BYOL) is required.

Constraints

Backup for Vertica (DBaaS) uses an initial full backup followed by incremental (Incremental Snapshot) backups, providing the ability to restore to a specific point in time (Snapshot) (not using the transaction log method).

Considerations

Vertica’s license imposes limits on data capacity, and a sudden increase in data during service use can become a problem. It is necessary to purchase a license after adequately estimating the amount of data you intend to store in advance.

Related service

This is a list of Samsung Cloud Platform services that are associated with the features or configurations described in this guide. Refer to it when selecting and designing services.

service groupserviceDetailed description
DatabasePostgreSQL(DBaaS)A service that easily creates and manages open-source PostgreSQL in a web environment
DatabaseMySQL(DBaaS)A small yet powerful open-source relational database MySQL service that simplifies creation and management.
DatabaseCacheStoreKey-value in-memory data store with fast data processing capability
StorageObject StorageObject storage that simplifies data storage and retrieval
Data AnalyticsEvent StreamService that creates and manages Apache Kafka clusters
Data AnalyticsSearch EngineA service that easily creates and manages Elasticsearch in a web environment
Table. List of related services