Overview

1: ServiceWatch Metrics

Service Overview

AIOS provides an environment where, after creating Virtual Server, GPU Server, and Kubernetes Engine resources on the Samsung Cloud Platform, you can develop AI applications using LLMs on those resources without installing or configuring a separate LLM service.

Features

Convenient LLM Use Samsung Cloud Platform provides LLM Endpoints by default, allowing you to use LLMs directly on Virtual Server, GPU Server, and Kubernetes Engine resources.
AI Development Productivity Improvement : AI developers can use various models with the same API, and support compatibility with OpenAI and LangChain SDKs, allowing easy integration with existing development environments and frameworks.
ServiceWatch service integration provided: You can monitor data through the ServiceWatch service.

Service Architecture Diagram

Provided features

We provide the following features.

AIOS LLM Endpoint Provision: When you request Virtual Server, GPU Server, or Kubernetes Engine services, the detailed page of the created resource provides LLM Endpoint information and a usage guide, and you can connect to the LLM on that resource and use it according to the guide.
AIOS Report provided: You can view the number of calls and token usage by type, by resource, and by model, as well as the total usage per LLM.

Provided model

The LLM models provided by AIOS are as follows.

Model name	model type	Introduction	Main uses	feature
gpt-oss-120b	Chat+Reasoning	Open-source GPT-series model based on 120 billion parameters, latest model	Research and experimentation, large-scale language understanding, AI services requiring complex reasoning/analysis, and construction of agent-based systems.	Huge Parameters Broad knowledge coverage, universal applicability Complete CoT chain generation
Qwen3-Coder-30B-A3B-Instruct	Code	Qwen3 series code model optimized for code generation and debugging	software development, AI code assistant, long document/repository analysis	Large-scale code knowledge learning Multilingual support Long-context understanding possible
Qwen3-30B-A3B-Thinking-2507	Chat+Reasoning	Qwen3 model enhanced for long-form reasoning and deep thinking (Thinking)	Research, analysis report, logical writing, mathematics, science, coding	Specialized for long-form and complex reasoning Generate consistent CoT chains
Llama-4-Scout	Chat+Vision	The latest Llama model with multimodal capability	Document analysis·summarization, customer support·chatbot	Multimodal (text+image), fast inference, runnable on a single GPU Supports ultra‑long text, multi‑document summarization/analysis, multimodal support State‑of‑the‑art performance across various benchmarks Up to 4 images can be input
Llama-Guard-4-12B	moderation	Core security and moderation models to enhance reliability and safety in the latest large language models and multimodal AI services.	Used for automatically filtering harmful content in user inputs and model responses.	Multimodal security classification Content moderation specialization Multilingual support
bge-m3	embedding	a core embedding model with three characteristics: multifunctionality, multilingual capability, and support for large-scale inputs	In generative AI, it is used to combine Dense and Sparse retrieval for external knowledge search and answer evidence provision, ensuring both accuracy and generalization performance.	Multi-Functionality: Dense Embedding Retrieval(Dense Retriveval), Token-based Weighted Retrieval(Sparse Retrieval), Multi-Vector Retrieval(Multi-Vector Retrieval) Multi-Linguality: Supports more than 100 languages Multi-Granularity: Handles up to 8,192 tokens
bge-reranker-v2-m3	rerank	A core component of various information retrieval, question answering, and chatbot systems that require fast and accurate re‑ranking of search results in multilingual environments.	Reorder candidate answers or documents for a question by relevance	Lightweight and fast inference Multilingual support Easy integration: compatible with Hugging Face Transformers, FlagEmbedding

Table. AIOS-provided LLM models

Availability by Region

AIOS is available in the environments below.

region	Provision status
Korea West (kr-west1)	Provide
Korea East (kr-east1)	Not provided
South Korea South 1 (kr-south1)	Not provided
South Korea South 2 (kr-south2)	Not provided
South Korea 3 (kr-south3)	Not provided

Table. AIOS availability by region

Preceding Service

This is a list of services that must be pre-configured before creating the service. Please refer to the guide provided for each service for details and prepare in advance.

Service Category	service	Detailed description
Compute	Virtual Server	Virtual server optimized for cloud computing
Compute	GPU Server	A virtual server suitable for tasks that require fast computation speed, such as AI model experiments, predictions, and inference, in a cloud environment.
Compute	Cloud Functions	Serverless computing-based Faas (Function as a Service)
Container	Kubernetes Engine	A service that provides lightweight virtual computing, containers, and Kubernetes clusters for managing them

Table. AIOS Preceding Services

1 - ServiceWatch Metrics

AIOS sends metrics to ServiceWatch. The metrics provided by default monitoring are data collected at 5‑minute intervals.

Reference

Refer to the ServiceWatch guide for how to view metrics in ServiceWatch.

Basic Metrics

The following are the basic metrics for the AIOS namespace.

The indicators whose names are shown in bold below are the indicators selected as major indicators among the basic indicators provided by AIOS. Key metrics are used to build service dashboards that are automatically created for each service in ServiceWatch.

Each metric indicates through the user guide which statistical value is meaningful when viewing that metric, and among the meaningful statistics, the statistical values shown in bold text are the primary statistics. In the service dashboard, primary metrics can be viewed using the primary statistical values.

Performance items	Detailed description	unit	meaningful statistics
Model Request Success	Number of successful model requests	Count	Total
Model Request Server Error	Number of model request failures (server error)	Count	Total
Model Request Client Error	Model request failure count (client error)	Count	Total
Model Request Throttled	The number of model requests is limited	Count	Total
Model Total Tokens	Model token usage (total)	Count	Total
Model Input Tokens	Model token usage (input)	Count	Total
Model Output Tokens	Model token usage (output)	Count	Total
Model Cached Tokens	Number of input tokens reused through cache	Count	Total

Table. AIOS basic indicators