The page has been translated by Gen AI.

Overview

Service Overview

AIOS provides an environment where, after creating Virtual Server, GPU Server, and Kubernetes Engine resources on the Samsung Cloud Platform, you can develop AI applications using LLM on those resources without separate LLM service installation or configuration.

Features

Convenient LLM usage Provides LLM Endpoint as a default, allowing you to use LLM directly from resources such as Virtual Server, GPU Server, Kubernetes Engine on Samsung Cloud Platform.
AI Development Productivity Improvement : AI developers can use various models with the same API, and support compatibility with OpenAI and LangChain SDKs, allowing easy integration with existing development environments and frameworks.

Service Configuration Diagram

Provided Features

We provide the following features.

AIOS LLM Endpoint provided: If you apply for Virtual Server, GPU Server, or Kubernetes Engine services, the detailed page of the created resource provides LLM Endpoint information and a usage guide, and according to the guide you can connect to the LLM from that resource and use it.
AIOS Report provided: You can check the number of calls and token usage by type, resource, and model, as well as the total usage by LLM.

Provided Model

The LLM models provided by AIOS are as follows.

Model Name	Model Type	Introduction	Main Uses	Features
gpt-oss-120b	Chat+Reasoning	ko) Open-source GPT series model based on 120 billion parameters, latest model	Research·experimentation, large-scale language understanding, AI services requiring complex reasoning/analysis, building agent-type systems	Ultra-large parameters Broad knowledge coverage, general-purpose usability Full CoT chain generation
Qwen3-Coder-30B-A3B-Instruct	Code	ko) Qwen3 series code model optimized for code generation and debugging	Software development, AI code assistant, long document/repository analysis	Large-scale code knowledge learning Multilingual support Long-context understanding possible
Qwen3-30B-A3B-Thinking-2507	Chat+Reasoning	ko) Qwen3 model enhanced for long-form reasoning and deep thinking (Thinking)	Research, analysis reports, logical writing, mathematics, science, coding	Specialized in long-form and complex reasoning Consistent CoT chain generation
Llama-4-Scout	Chat+Vision	Latest Llama model with multimodal capability	Document analysis·summarization, customer support·chatbot	Multimodal (text+image), fast inference, runnable on a single GPU Very long text, multi-document summarization/analysis possible, multimodal support Top performance among peers on various benchmarks Up to 4 images can be input
Llama-Guard-4-12B	moderation	Core security and moderation model to enhance reliability and safety in the latest large language models and multimodal AI services	Used for automatic filtering of harmful user inputs and model responses	Multimodal security classification Content moderation specialization Multilingual support
bge-m3	embedding	Core embedding model with three characteristics: multi-functionality, multilingual support, and large-scale input handling	Used in generative AI to retrieve external knowledge and provide answer evidence by combining dense and sparse retrieval to ensure both accuracy and generalization performance	Multi-Functionality: dense embedding retrieval (Dense Retrieval), token-based weighted retrieval (Sparse Retrieval), multi-vector retrieval (Multi-Vector Retrieval) Multi-Linguality: supports more than 100 languages Multi-Granularity: can handle up to 8,192 tokens
bge-reranker-v2-m3	rerank	A core component for various information retrieval, question answering, and chatbot systems that require fast and accurate re-ranking of search results in multilingual environments	Re-rank candidate answers or documents for a question in order of relevance	Lightweight and high-speed inference Multilingual support Easy integration: compatible with Hugging Face Transformers, FlagEmbedding

Table. LLM models provided by AIOS

Region-specific provision status

AIOS is available in the following environment.

Region	Availability
Korea West (kr-west1)	Provided
Korea East (kr-east1)	Not provided
South Korea 1(kr-south1)	Not provided
South Korea South2(kr-south2)	Not provided
South Korea South 3(kr-south3)	Not provided

Table. AIOS regional provision status

Pre-service

This is a list of services that must be pre-configured before creating the service. For details, refer to the guide provided for each service and prepare in advance.

Service Category	Service	Detailed Description
Compute	Virtual Server	Virtual server optimized for cloud computing
Compute	GPU Server	A virtual server suitable for tasks that require fast computation speed, such as AI model experiments, predictions, and inference in a cloud environment.
Compute	Cloud Functions	Serverless computing based Faas (Function as a Service)
Container	Kubernetes Engine	A service that provides lightweight virtual computing and containers, and Kubernetes clusters for managing them

Table. AIOS Preliminary Service

AI-ML

How-to Guides