The page has been translated by Gen AI.

Large-Scale Data Processing Architecture

Large-Scale Data Processing Architecture

Overview

Samsung Cloud Platform provides a DW-type database service using Vertica (DBaaS), an analysis-only database. The Vertica service is a service that allows you to easily build and manage a Vertica cluster, and provides a UI for managing cluster information and status.

Vertica is designed with a Masterless Pure-MPP (Massively Parallel Processing) architecture, making it suitable for parallel analysis of large-scale data, and includes in-DB machine learning and advanced analytics features. It stores not only standard information but also financial, health, monitoring, and event occurrence information, and allows users to quickly extract and analyze the information they want using various analysis features.

In the future, it will be possible to easily extract and analyze data from various data storage systems such as Object Storage or HDFS.

Architecture Diagram

구성도
Figure 1. Large-Scale Data Processing Architecture
  1. Users use various services (finance, hospital, event).
  2. To ensure service continuity, RDB is configured with HA, and NoSQL is configured with a cluster to store various data that users want.
  3. The CacheStore cache service is used to provide content to users quickly and store session information to reduce processing time.
  4. The Event Streams message service is used to transmit data to the target storage in real-time or batch mode.
  5. The Elastic Stack is used to collect data from multiple systems (Logstash), analyze and search (Search Engine), and provide various information through visualization (Kibana).
  6. Data stored in RDB and NoSQL is transmitted to Vertica (DBaaS) through batch processing and stored.
  7. Various data stored in Vertica (DBaaS) is used for analysis.
  8. Data stored in Object Storage or HDFS can also be linked to Vertica (DBaaS) for data analysis.

Use Cases

Building a Monitoring System

When periodically checking dozens of servers for abnormal signs and analyzing them, agents within each server are used to collect various information (server settings, system logs, software installation information, security information, etc.) and store it.

Pre-defined abnormal sign patterns can be mapped to select problematic targets.

Data-Centric Hospital

The medical records of patients from multiple contracted hospitals (disease name, occurrence location, general symptoms, special circumstances, treatment status, etc.) are stored in one place.

When a patient visits, diagnosis can be quickly made based on symptoms, and treatment status and methods can be provided to the patient.

Pre-requisites

Building the Vertica service requires customer license usage (BYOL).

Limitations

Backup of Vertica (DBaaS) is done by taking an initial full backup and then incremental snapshots, providing a restore function to a specific point in time (not transaction log-based).

Considerations

Vertica’s license has limitations on data capacity, and if data increases suddenly during service use, it may cause problems. It is necessary to estimate the data capacity to be stored in advance and purchase a license.

Related Services

This is a list of Samsung Cloud Platform services related to the features or configurations described in this guide. Please refer to it when selecting and designing services.

Service GroupServiceDetailed Description
DatabasePostgreSQL(DBaaS)A service that easily creates and manages open-source PostgreSQL in a web environment
DatabaseMySQL(DBaaS)A service that easily creates and manages a small but powerful open-source relational database MySQL
DatabaseCacheStoreA key-value in-memory data store with fast data processing capabilities
StorageObject StorageAn object storage that is convenient for data storage and search
Data AnalyticsEvent StreamA service that creates and manages an Apache Kafka cluster
Data AnalyticsSearch EngineA service that easily creates and manages Elasticsearch in a web environment
Table. Related Services List