This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Overview

    Service Overview

    AIOS provides an environment where, after creating Virtual Server, GPU Server, and Kubernetes Engine resources on the Samsung Cloud Platform, you can develop AI applications using LLM on those resources without separate LLM service installation or configuration.

    Features

    • Convenient LLM usage Provides LLM Endpoint as a default, allowing you to use LLM directly from resources such as Virtual Server, GPU Server, Kubernetes Engine on Samsung Cloud Platform.
    • AI Development Productivity Improvement : AI developers can use various models with the same API, and support compatibility with OpenAI and LangChain SDKs, allowing easy integration with existing development environments and frameworks.

    Service Configuration Diagram

    Diagram
    Figure. AIOS diagram

    Provided Features

    We provide the following features.

    • AIOS LLM Endpoint provided: If you apply for Virtual Server, GPU Server, or Kubernetes Engine services, the detailed page of the created resource provides LLM Endpoint information and a usage guide, and according to the guide you can connect to the LLM from that resource and use it.
    • AIOS Report provided: You can check the number of calls and token usage by type, resource, and model, as well as the total usage by LLM.

    Provided Model

    The LLM models provided by AIOS are as follows.

    Model NameModel TypeIntroductionMain UsesFeatures
    gpt-oss-120bChat+Reasoningko) Open-source GPT series model based on 120 billion parameters, latest modelResearch·experimentation, large-scale language understanding, AI services requiring complex reasoning/analysis, building agent-type systems
    • Ultra-large parameters
    • Broad knowledge coverage, general-purpose usability
    • Full CoT chain generation
    Qwen3-Coder-30B-A3B-InstructCodeko) Qwen3 series code model optimized for code generation and debuggingSoftware development, AI code assistant, long document/repository analysis
    • Large-scale code knowledge learning
    • Multilingual support
    • Long-context understanding possible
    Qwen3-30B-A3B-Thinking-2507Chat+Reasoningko) Qwen3 model enhanced for long-form reasoning and deep thinking (Thinking)Research, analysis reports, logical writing, mathematics, science, coding
    • Specialized in long-form and complex reasoning
    • Consistent CoT chain generation
    Llama-4-ScoutChat+VisionLatest Llama model with multimodal capabilityDocument analysis·summarization, customer support·chatbot
    • Multimodal (text+image), fast inference, runnable on a single GPU
    • Very long text, multi-document summarization/analysis possible, multimodal support
    • Top performance among peers on various benchmarks
    • Up to 4 images can be input
    Llama-Guard-4-12BmoderationCore security and moderation model to enhance reliability and safety in the latest large language models and multimodal AI servicesUsed for automatic filtering of harmful user inputs and model responses
    • Multimodal security classification
    • Content moderation specialization
    • Multilingual support
    bge-m3embeddingCore embedding model with three characteristics: multi-functionality, multilingual support, and large-scale input handlingUsed in generative AI to retrieve external knowledge and provide answer evidence by combining dense and sparse retrieval to ensure both accuracy and generalization performance
    • Multi-Functionality: dense embedding retrieval (Dense Retrieval), token-based weighted retrieval (Sparse Retrieval), multi-vector retrieval (Multi-Vector Retrieval)
    • Multi-Linguality: supports more than 100 languages
    • Multi-Granularity: can handle up to 8,192 tokens
    bge-reranker-v2-m3rerankA core component for various information retrieval, question answering, and chatbot systems that require fast and accurate re-ranking of search results in multilingual environmentsRe-rank candidate answers or documents for a question in order of relevance
    • Lightweight and high-speed inference
    • Multilingual support
    • Easy integration: compatible with Hugging Face Transformers, FlagEmbedding
    Table. LLM models provided by AIOS

    Region-specific provision status

    AIOS is available in the following environment.

    RegionAvailability
    Korea West (kr-west1)Provided
    Korea East (kr-east1)Not provided
    South Korea 1(kr-south1)Not provided
    South Korea South2(kr-south2)Not provided
    South Korea South 3(kr-south3)Not provided
    Table. AIOS regional provision status

    Pre-service

    This is a list of services that must be pre-configured before creating the service. For details, refer to the guide provided for each service and prepare in advance.

    Service CategoryServiceDetailed Description
    ComputeVirtual ServerVirtual server optimized for cloud computing
    ComputeGPU ServerA virtual server suitable for tasks that require fast computation speed, such as AI model experiments, predictions, and inference in a cloud environment.
    ComputeCloud FunctionsServerless computing based Faas (Function as a Service)
    ContainerKubernetes EngineA service that provides lightweight virtual computing and containers, and Kubernetes clusters for managing them
    Table. AIOS Preliminary Service