The page has been translated by Gen AI.

Overview

Service Overview

AIOS provides an environment where, after creating Virtual Server, GPU Server, and Kubernetes Engine resources on the Samsung Cloud Platform, you can develop AI applications using LLMs on those resources without installing or configuring a separate LLM service.

Features

  • Convenient LLM Use Samsung Cloud Platform provides LLM Endpoints by default, allowing you to use LLMs directly on Virtual Server, GPU Server, and Kubernetes Engine resources.
  • AI Development Productivity Improvement : AI developers can use various models with the same API, and support compatibility with OpenAI and LangChain SDKs, allowing easy integration with existing development environments and frameworks.
  • ServiceWatch service integration provided: You can monitor data through the ServiceWatch service.

Service Architecture Diagram

Diagram
Figure. AIOS diagram

Provided features

We provide the following features.

  • AIOS LLM Endpoint Provision: When you request Virtual Server, GPU Server, or Kubernetes Engine services, the detailed page of the created resource provides LLM Endpoint information and a usage guide, and you can connect to the LLM on that resource and use it according to the guide.
  • AIOS Report provided: You can view the number of calls and token usage by type, by resource, and by model, as well as the total usage per LLM.

Provided model

The LLM models provided by AIOS are as follows.

Model namemodel typeIntroductionMain usesfeature
gpt-oss-120bChat+ReasoningOpen-source GPT-series model based on 120 billion parameters, latest modelResearch and experimentation, large-scale language understanding, AI services requiring complex reasoning/analysis, and construction of agent-based systems.
  • Huge Parameters
  • Broad knowledge coverage, universal applicability
  • Complete CoT chain generation
Qwen3-Coder-30B-A3B-InstructCodeQwen3 series code model optimized for code generation and debuggingsoftware development, AI code assistant, long document/repository analysis
  • Large-scale code knowledge learning
  • Multilingual support
  • Long-context understanding possible
Qwen3-30B-A3B-Thinking-2507Chat+ReasoningQwen3 model enhanced for long-form reasoning and deep thinking (Thinking)Research, analysis report, logical writing, mathematics, science, coding
  • Specialized for long-form and complex reasoning
  • Generate consistent CoT chains
Llama-4-ScoutChat+VisionThe latest Llama model with multimodal capabilityDocument analysis·summarization, customer support·chatbot
  • Multimodal (text+image), fast inference, runnable on a single GPU
  • Supports ultra‑long text, multi‑document summarization/analysis, multimodal support
  • State‑of‑the‑art performance across various benchmarks
  • Up to 4 images can be input
Llama-Guard-4-12BmoderationCore security and moderation models to enhance reliability and safety in the latest large language models and multimodal AI services.Used for automatically filtering harmful content in user inputs and model responses.
  • Multimodal security classification
  • Content moderation specialization
  • Multilingual support
bge-m3embeddinga core embedding model with three characteristics: multifunctionality, multilingual capability, and support for large-scale inputsIn generative AI, it is used to combine Dense and Sparse retrieval for external knowledge search and answer evidence provision, ensuring both accuracy and generalization performance.
  • Multi-Functionality: Dense Embedding Retrieval(Dense Retriveval), Token-based Weighted Retrieval(Sparse Retrieval), Multi-Vector Retrieval(Multi-Vector Retrieval)
  • Multi-Linguality: Supports more than 100 languages
  • Multi-Granularity: Handles up to 8,192 tokens
bge-reranker-v2-m3rerankA core component of various information retrieval, question answering, and chatbot systems that require fast and accurate re‑ranking of search results in multilingual environments.Reorder candidate answers or documents for a question by relevance
  • Lightweight and fast inference
  • Multilingual support
  • Easy integration: compatible with Hugging Face Transformers, FlagEmbedding
Table. AIOS-provided LLM models

Availability by Region

AIOS is available in the environments below.

regionProvision status
Korea West (kr-west1)Provide
Korea East (kr-east1)Not provided
South Korea South 1 (kr-south1)Not provided
South Korea South 2 (kr-south2)Not provided
South Korea 3 (kr-south3)Not provided
Table. AIOS availability by region

Preceding Service

This is a list of services that must be pre-configured before creating the service. Please refer to the guide provided for each service for details and prepare in advance.

Service CategoryserviceDetailed description
ComputeVirtual ServerVirtual server optimized for cloud computing
ComputeGPU ServerA virtual server suitable for tasks that require fast computation speed, such as AI model experiments, predictions, and inference, in a cloud environment.
ComputeCloud FunctionsServerless computing-based Faas (Function as a Service)
ContainerKubernetes EngineA service that provides lightweight virtual computing, containers, and Kubernetes clusters for managing them
Table. AIOS Preceding Services
AI-ML
ServiceWatch Metrics