This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

API Reference

    API Reference Overview

    The API references supported by AIOS are as follows.

    API nameAPIDetailed description
    Rerank APIPOST /rerank, /v1/rerank, /v2/rerankWe apply an embedding model or a cross‑encoder model to predict the relevance between a single query and each item in a document list.
    Score APIPOST /score, /v1/scorePredict the similarity of two sentences.
    Chat Completions APIPOST /v1/chat/completionsIt is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.
    Completions APIPOST /v1/completionsIt is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.
    Embedding APIPOST /v1/embeddingsYou can convert text into high-dimensional vectors (embeddings) and use them for various natural language processing (NLP) tasks such as similarity calculation between texts, clustering, and search.
    Table. AIOS supported API list

    Rerank API

    POST /rerank, /v1/rerank, /v2/rerank
    

    Overview

    The Rerank API predicts the relevance between a single query and each item in a document list by applying an embedding model or a cross-encoder model. Generally, the score of a sentence pair represents the similarity between the two sentences on a scale from 0 to 1.

    • Embedding-based model: After converting the query and documents each into vectors, we measure the similarity between vectors (e.g., cosine similarity) and compute a score.
    • Reranker(Cross-Encoder) based model: Evaluates by feeding a query and document pair into the model.

    Request

    Context

    KeyTypeDescriptionExample
    Base URLstringAIOS URL for API requestsAIOS LLM Private Endpoint
    Request MethodstringHTTP methods used in API requestsPOST
    HeadersobjectHeader information required for the request{ “Content-Type”: “application/json” }
    Body ParametersobjectParameters included in the request body{ “model”: “sds/bge-m3”, “query”: …, “documents”: […] }
    Table. Re-rank API - Context

    Path Parameters

    NametypeRequiredDescriptionDefault valueBoundary valueExample
    None
    Table. Re-rank API - Path Parameters

    Query Parameters

    NametypeRequiredDescriptionDefault valueBoundary valueExample
    None
    Table. Re-rank API - Query Parameters

    Body Parameters

    NameName SubtypeRequiredDescriptionDefault valueBoundary valueExample
    model-stringSpecify the model to use for response generation“sds/bge-reranker-v2-m3”
    query-stringUser’s search query or question“What is the capital of France?"
    documents-arrayList of documents to be reorderedMaximum model input length limit[“The capital of France is Paris.”]
    top_n-integerSpecify the number of parent documents to return (0 returns all)0> 05
    truncate_prompt_tokens-integerLimit the number of input tokens> 0100
    Table. Re-rank API - Body Parameters

    Example

    Color mode
    curl -X "POST" \
       {AIOS LLM private endpoint}/rerank \
      -H "Content-Type: application/json" \
      -d '{
        "model": "sds/bge-reranker-v2-m3",
        "query": "What is the capital of France?",
        "documents": [
          "The capital of France is Paris.",
          "France capital city is known for the Eiffel Tower.",
          "Paris is located in the north-central part of France."
        ],
        "top_n": 2, 
        "truncate_prompt_tokens": 512
      }'
    curl -X "POST" \
       {AIOS LLM private endpoint}/rerank \
      -H "Content-Type: application/json" \
      -d '{
        "model": "sds/bge-reranker-v2-m3",
        "query": "What is the capital of France?",
        "documents": [
          "The capital of France is Paris.",
          "France capital city is known for the Eiffel Tower.",
          "Paris is located in the north-central part of France."
        ],
        "top_n": 2, 
        "truncate_prompt_tokens": 512
      }'
    Code block. Re-Rank API Request Example

    Response

    200 OK

    NameTypeDescription
    idstringUnique identifier of the API response (UUID format)
    modelstringName of the model that generated the result
    usageintegerObject containing resource information used in the request
    usage.total_tokensintegerTotal number of tokens used for request processing
    resultstringAn array containing the results of documents related to the query
    results[].indexintegerThe index number within the result array
    results[].documentobjectAn object containing the contents of the retrieved document
    results[].document.textstringThe actual text content of the retrieved document
    results[].relevance_scorefloatScore indicating the relevance between the query and the document (0 ~ 1)
    Table. Re-rank API - 200 OK

    Error Code

    HTTP status codeErrorCode description
    400Bad Request
    422Validation Error
    500Internal Server Error
    Table. Re-rank API - Error Code

    Example

    Color mode
    {
      "id": "rerank-scp-aios-rerank",
      "model": "sds/sds/bge-m3",
      "usage": {
        "total_tokens": 65
      },
      "results": [
        {
          "index": 0,
          "document": {
            "text": "The capital of France is Paris."
          },
          "relevance_score": 0.8291233777999878
        },
        {
          "index": 1,
          "document": {
            "text": "France capital city is known for the Eiffel Tower."
          },
          "relevance_score": 0.6996355652809143
        }
      ]
    }
    {
      "id": "rerank-scp-aios-rerank",
      "model": "sds/sds/bge-m3",
      "usage": {
        "total_tokens": 65
      },
      "results": [
        {
          "index": 0,
          "document": {
            "text": "The capital of France is Paris."
          },
          "relevance_score": 0.8291233777999878
        },
        {
          "index": 1,
          "document": {
            "text": "France capital city is known for the Eiffel Tower."
          },
          "relevance_score": 0.6996355652809143
        }
      ]
    }
    Code block. Re-Rank API Response Example

    Reference

    Score API

    POST /score, /v1/score
    

    Overview

    The Score API predicts the similarity between two sentences. This API calculates the score using one of two models.

    • Reranker(Cross-Encoder) model: It takes a pair of sentences as input and directly predicts similarity scores.
    • Embedding model: After generating embedding vectors for each sentence, compute the cosine similarity (Cosine similarity) to derive a score.

    Request

    Context

    KeyTypeDescriptionExample
    Base URLstringAIOS URL for API requestsAIOS LLM Private Endpoint
    Request MethodstringHTTP methods used in API requestsPOST
    HeadersobjectHeader information required for the request{ “Content-Type”: “application/json” }
    Body ParametersobjectParameters included in the request body{ “model”: “sds/bge-reranker-v2-m3”, “text_1”: […], “text_2”: […] }
    Table. Score API - Context

    Path Parameters

    NametypeRequiredDescriptionDefault valueBoundary valueExample
    None
    Table. Score API - Path Parameters

    Query Parameters

    NametypeRequiredDescriptionDefault valueBoundary valueExample
    None
    Table. Score API - Query Parameters

    Body Parameters

    NameName SubtypeRequiredDescriptionDefault valueBoundary valueExample
    model-stringSpecify the model to use for response generation“sds/bge-reranker-v2-m3”
    encoding_format-stringScore return formatfloat
    • “float”(default)
    • “int”
    “float”
    text_1-string, arrayFirst text to compare
    • string ("")
    • maximum input length limit of the model
    “What is the capital of France?"
    text_2-string, arraySecond text to compare
    • string (”")
    • maximum input length limit of the model
    [“The capital of France is Paris.”, ]
    truncate_prompt_tokens-integerLimit the number of input tokens> 0100
    Table. Score API - Body Parameters

    Example

    Color mode
    curl -X "POST" \
      {AIOS LLM private endpoint}/score
      -H "Content-Type: application/json" \
      -d '{
      "model": "sds/bge-reranker-v2-m3",
      "encoding_format": "float",
    "text_1": [
      What is the largest planet in the solar system?
      What is the chemical symbol for water?
    ],
    "text_2": [
      Jupiter is the largest planet in the solar system.
      The chemical formula of water is H₂O.
    ]
    }'
    curl -X "POST" \
      {AIOS LLM private endpoint}/score
      -H "Content-Type: application/json" \
      -d '{
      "model": "sds/bge-reranker-v2-m3",
      "encoding_format": "float",
    "text_1": [
      What is the largest planet in the solar system?
      What is the chemical symbol for water?
    ],
    "text_2": [
      Jupiter is the largest planet in the solar system.
      The chemical formula of water is H₂O.
    ]
    }'
    Code block. Score API Request Example

    Response

    200 OK

    NameTypeDescription
    idstringUnique identifier of the response
    objectstringResponse object’s type (example: “list” )
    createdintegerCreation time (Unix timestamp, in seconds)
    modelstringName of the model used
    dataarrayScore Calculation Result List
    data.indexintegerIndex of the item in the data array
    data.objectstringData item type (example: “score”)
    data.scorenumberCalculated score value, normalized to a range of 0 to 1.
    usageobjectToken usage statistics
    usage.prompt_tokensintegerNumber of tokens used in the input prompt
    usage.total_tokensintegerTotal token count (input + output)
    usage.completion_tokensintegerNumber of tokens used in the generated response
    usage.prompt_tokens_detailsnullPrompt token details
    Table. Score API - 200 OK

    Error Code

    HTTP status codeErrorCode description
    400Bad Request
    422Validation Error
    500Internal Server Error
    Table. Score API - Error Code

    Example

    Color mode
    {
      "id": "score-scp-aios-score",
      "object": "list",
      "created": 1748574112,
      "model": "sds/bge-reranker-v2-m3",
      "data": [
        {
          "index": 0,
          "object": "score",
          "score": 1.0
        },
        {
          "index": 1,
          "object": "score",
          "score": 1.0
        }
      ],
      "usage": {
        "prompt_tokens": 53,
        "total_tokens": 53,
        "completion_tokens": 0,
        "prompt_tokens_details": null
      }
    }
    {
      "id": "score-scp-aios-score",
      "object": "list",
      "created": 1748574112,
      "model": "sds/bge-reranker-v2-m3",
      "data": [
        {
          "index": 0,
          "object": "score",
          "score": 1.0
        },
        {
          "index": 1,
          "object": "score",
          "score": 1.0
        }
      ],
      "usage": {
        "prompt_tokens": 53,
        "total_tokens": 53,
        "completion_tokens": 0,
        "prompt_tokens_details": null
      }
    }
    code block. Score API Response Example

    Reference

    Chat Completions API

    POST /v1/chat/completions
    

    Overview

    The Chat Completions API is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.

    Request

    Context

    KeyTypeDescriptionExample
    Base URLstringAIOS URL for API requestsAIOS LLM Private Endpoint
    Request MethodstringHTTP methods used in API requestsPOST
    HeadersobjectHeader information required for the request{ “Content-Type”: “application/json” }
    Body ParametersobjectParameters included in the request body{“model”: “meta-llama/Llama-3.3-70B-Instruct”, “messages” [{“role”: “user”, “content”: “hello”}], “stream”: true }
    Table. Chat Completions API - Context

    Path Parameters

    NametypeRequiredDescriptionDefault valueBoundary valueExample
    None
    Table. Chat Completions API - Path Parameters

    Query Parameters

    NametypeRequiredDescriptionDefault valueBoundary valueExample
    None
    Table. Chat Completions API - Query Parameters

    Body Parameters

    NameName SubtypeRequiredDescriptionDefault valueBoundary valueExample
    model-stringSpecify the model to use for response generation“meta-llama/Llama-3.3-70B-Instruct”
    messagesrolestringMessage list containing conversation history[ { “role” : “user” , “content” : “message” }]
    frequency_penalty-numberAdjust the penalty for repeated tokens0-2.0 ~ 2.00.5
    logit_bias-objectAdjust the probability of a specific token (example: { “100”: 2.0 })nullKey: Token ID, Value: -100 ~ 100{ “100”: 2.0 }
    logprobs-booleanReturns token probabilities for the top logprobs countfalsetrue, falsetrue
    max_completion_tokens-integerLimit the maximum number of generated tokensNone0 ~ model maximum value100
    max_tokens (Deprecated)-integerLimit the maximum number of generated tokensNone0 ~ model maximum value100
    n-integerSpecify the number of responses to generate13
    presence_penalty-numberAdjust the penalty for tokens contained in the existing text.0-2.0 ~ 2.01.0
    seed-integerSpecify the seed value for controlling randomnessNone
    stop-string / array / nullStop generation when a specific string appears.null"\n"
    stream-booleanWhether to return results in streaming modefalsetrue/falsetrue
    stream_optionsinclude_usage, continuous_usage_statsobjectControl streaming options (e.g., whether to include usage statistics)null{ “include_usage”: true }
    temperature-numberAdjust the creativity of the generated output (higher values are more random)10.0 ~ 1.00.7
    tool_choice-stringAdjust which Tool is invoked by the model
    • none: Do not invoke any Tool
    • auto: Let the model choose whether to generate a message or invoke a Tool
    • required: The model must invoke one or more Tools
    • when there is no tool: none
    • when there is a tool: auto
    tools-arrayList of tools the model can invoke
    • Only functions are supported as tools
    • Supports up to 128 functions
    None
    top_logprobs-integerSpecify the number of most probable tokens as an integer between 0 and 20
    • Each is associated with a log probability value
    • logprobs must be set to true
    • Shows the probability values for the top k of completions
    None0 ~ 203
    top_p-numberLimit the sampling probability of tokens (higher values consider more tokens)10.0 ~ 1.00.9
    Table. Chat Completions API - Body Parameters

    Example

    Color mode
    curl -X "POST" \
       {AIOS LLM private endpoint}/v1/chat/completions
      -H "Content-Type: application/json" \
      -d '{
        "model": "/mnt/models/Meta-Llama-3.3-70B-Instruct",
          "messages": [
          {
            "role": "assistant",
            "content": "You are a helpful assistant."
          },
          {
            "role": "user",
            "content": "What is the capital of Korea?"
          }
        ]
    }'
    curl -X "POST" \
       {AIOS LLM private endpoint}/v1/chat/completions
      -H "Content-Type: application/json" \
      -d '{
        "model": "/mnt/models/Meta-Llama-3.3-70B-Instruct",
          "messages": [
          {
            "role": "assistant",
            "content": "You are a helpful assistant."
          },
          {
            "role": "user",
            "content": "What is the capital of Korea?"
          }
        ]
    }'
    Code block. CompChat Completionsletions API Request Example

    Response

    200 OK

    NameTypeDescription
    idstringunique identifier of the response
    objectstringResponse object’s type (example: “chat.completion”)
    createdintegerCreation time (Unix timestamp, in seconds)
    modelstringName of the model used
    choicesarrayList of generated response options
    choices[].indexintegerThe index of the corresponding choice
    choices[].messageobjectGenerated message object
    choices[].message.rolestringThe role of the message author (e.g., “assistant”)
    choices[].message.contentstringThe actual content of the generated message
    choices[].message.reasoning_contentstringThe actual content of the generated inference message
    choices[].message.tool_callsarray (optional)Tool invocation information (may be included depending on model/settings)
    choices[].finish_reasonstring or nullReason the response was terminated (e.g., “stop”, “length”, etc.)
    choices[].stop_reasonobject or nullAdditional stop reason details
    choices[].logprobsobject or nullLog probability information per token (included depending on settings)
    usageobjectToken usage statistics
    usage.prompt_tokensintegerNumber of tokens used in the input prompt
    usage.completion_tokensintegerNumber of tokens used in the generated response
    usage.total_tokensintegerTotal token count (input + output)
    Table. Chat Completions API - 200 OK

    Error Code

    HTTP status codeErrorCode description
    400Bad Request
    422Validation Error
    500Internal Server Error
    Table. Chat Completions API - Error Code

    Example

    Color mode
    {
      "id": "chatcmpl-scp-aios-chat-completions",
      "object": "chat.completion",
      "created": 1749702816,
      "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "reasoning_content": null,
            "content": "The capital of South Korea is Seoul."
            "tool_calls": []
          },
          "logprobs": null,
          "finish_reason": "stop",
          "stop_reason": null
        }
      ],
      "usage": {
        "prompt_tokens": 54,
        "total_tokens": 62,
        "completion_tokens": 8,
        "prompt_tokens_details": null
      },
      "prompt_logprobs": null
    }
    {
      "id": "chatcmpl-scp-aios-chat-completions",
      "object": "chat.completion",
      "created": 1749702816,
      "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "reasoning_content": null,
            "content": "The capital of South Korea is Seoul."
            "tool_calls": []
          },
          "logprobs": null,
          "finish_reason": "stop",
          "stop_reason": null
        }
      ],
      "usage": {
        "prompt_tokens": 54,
        "total_tokens": 62,
        "completion_tokens": 8,
        "prompt_tokens_details": null
      },
      "prompt_logprobs": null
    }
    code block. Chat Completions API Response Example

    Reference

    Completions API

    POST /v1/completions
    

    Overview

    The Completions API is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.

    Request

    Context

    KeyTypeDescriptionExample
    Base URLstringAIOS URL for API requestsAIOS LLM Private Endpoint
    Request MethodstringHTTP methods used in API requestsPOST
    HeadersobjectHeader information required for the request{ “Content-Type”: “application/json” }
    Body ParametersobjectParameters included in the request body{“model”: “meta-llama/Llama-3.3-70B-Instruct”, “prompt” : “hello”, “stream”: true }
    Table. Completions API - Context

    Path Parameters

    NametypeRequiredDescriptionDefault valueBoundary valueExample
    None
    Table. Completions API - Path Parameters

    Query Parameters

    NametypeRequiredDescriptionDefault valueBoundary valueExample
    None
    Table. Completions API - Query Parameters

    Body Parameters

    NameName SubtypeRequiredDescriptionDefault valueBoundary valueExample
    model-stringSpecify the model to use for generating responses“meta-llama/Llama-3.3-70B-Instruct”
    prompt-array, stringUser input text""
    echo-booleanWhether to include the input text in the outputfalsetrue/falsetrue
    frequency_penalty-numberAdjust the penalty for repeated tokens0-2.0 ~ 2.00.5
    logit_bias-objectAdjust the probability of a specific token (example: { “100”: 2.0 })nullKey: Token ID, Value: -100~100{ “100”: 2.0 }
    logprobs-integerReturns token probabilities for the top logprobs countnull1 ~ 55
    max_completion_tokens-integerLimit the maximum number of generated tokensNone0~model maximum value100
    max_tokens (Deprecated)-integerLimit the maximum number of generated tokensNone0~model maximum value100
    n-integerSpecify the number of responses to generate13
    presence_penalty-numberAdjust the penalty for tokens in the existing text.0-2.0 ~ 2.01.0
    seed-integerSpecify a seed value for controlling randomnessNone
    stop-string / array / nullStop generation when a specific string appears.null"\n"
    stream-booleanWhether to return results in streaming modefalsetrue/falsetrue
    stream_optionsinclude_usage, continuous_usage_statsobjectControl streaming options (e.g., whether to include usage statistics)null{ “include_usage”: true }
    temperature-numberAdjust the creativity of the generation result (higher values are more random)10.0 ~ 1.00.7
    top_p-numberLimit the sampling probability of tokens (higher values consider more tokens)10.0 ~ 1.00.9
    Table. Completions API - Body Parameters

    Example

    Color mode
    curl -X "POST" \
       {AIOS LLM Private Endpoint}/v1/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
        "prompt": "What is the capital of South Korea?"
        "temperature": 0.7
      }'
    curl -X "POST" \
       {AIOS LLM Private Endpoint}/v1/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
        "prompt": "What is the capital of South Korea?"
        "temperature": 0.7
      }'
    code block. Completions API Request Example

    Response

    200 OK

    NameTypeDescription
    idstringUnique identifier of the response
    objectstringResponse object’s type (e.g., “text_completion”)
    createdintegerCreation time (Unix timestamp, in seconds)
    modelstringName of the model used
    choicesarrayList of generated response options
    choices[].indexnumberThe index of the corresponding choice
    choices[].textstringGenerated text object
    choices[].logprobsobjectLog probability information per token (included depending on settings)
    choices[].finish_reasonstring or nullReason the response was terminated (e.g., “stop”, “length”, etc.)
    choices[].stop_reasonobject or nullAdditional stop reason details
    choices[].prompt_logprobsobject or nullLog probability per input prompt token (null allowed)
    usageobjectToken usage statistics
    usage.prompt_tokensnumberNumber of tokens used in the input prompt
    usage.total_tokensnumberTotal token count (input + output)
    usage.completion_tokensnumberNumber of tokens used in the generated response
    usage.prompt_tokens_detailsobjectPrompt token usage details
    Table. Completions API - 200 OK

    Error Code

    HTTP status codeErrorCode description
    400Bad Request
    422Validation Error
    500Internal Server Error
    Table. Completions API - Error Code

    Example

    Color mode
    {
      "id": "cmpl-scp-aios-completions",
      "object": "text_completion",
      "created": 1749702612,
      "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
      "choices": [
        {
          "index": 0,
          "text": " \nOur capital city is Seoul. \n\nA. 1\nB. ",
          "logprobs": null,
          "finish_reason": "length",
          "stop_reason": null,
          "prompt_logprobs": null
        }
      ],
      "usage": {
        "prompt_tokens": 9,
        "total_tokens": 25,
        "completion_tokens": 16,
        "prompt_tokens_details": null
      }
    }
    {
      "id": "cmpl-scp-aios-completions",
      "object": "text_completion",
      "created": 1749702612,
      "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
      "choices": [
        {
          "index": 0,
          "text": " \nOur capital city is Seoul. \n\nA. 1\nB. ",
          "logprobs": null,
          "finish_reason": "length",
          "stop_reason": null,
          "prompt_logprobs": null
        }
      ],
      "usage": {
        "prompt_tokens": 9,
        "total_tokens": 25,
        "completion_tokens": 16,
        "prompt_tokens_details": null
      }
    }
    code block. Completions API Response Example

    Reference

    Embedding API

    POST /v1/embeddings
    

    Overview

    The Embedding API converts text into high‑dimensional vectors (embeddings), which can be used for various natural language processing (NLP) tasks such as similarity calculation between texts, clustering, and search.

    Request

    Context

    KeyTypeDescriptionExample
    Base URLstringAIOS URL for API requestsapplication/json
    Request MethodstringHTTP methods used in API requestsPOST
    HeadersobjectHeader information required for the request{ “accept”: “application/json”, “Content-Type”: “application/json” }
    Body ParametersobjectParameters included in the request body{ “model”: “sds/bge-m3”, “input”: “What is the capital of France?”}
    Table. Embedding API - Context

    Path Parameters

    NametypeRequiredDescriptionDefault valueBoundary valueExample
    None
    Table. Embedding API - Path Parameters

    Query Parameters

    NametypeRequiredDescriptionDefault valueBoundary valueExample
    None
    Table. Embedding API - Query Parameters

    Body Parameters

    NameName SubtypeRequiredDescriptionDefault valueBoundary valueExample
    model-stringSpecify the model to use for generating responses“sds/bge-reranker-v2-m3”
    input-array<stringUser’s search query or question“What is the capital of France?"
    encoding_format-stringSpecify the format to return the embeddingfloat“float”, “base64”[0.01319122314453125,0.057220458984375, … (omitted)
    truncate_prompt_tokens-integerLimit the number of input tokens> 0100
    Table. Embedding API - Body Parameters

    Example

    Color mode
    curl -X "POST" \
       {AIOS LLM Private Endpoint}/v1/embedding \
      -H "Content-Type: application/json" \
      -d '{
        "model": "sds/bge-m3",
        "input": "What is the capital of France?",
    	"encoding_format": "float"
      }'
    curl -X "POST" \
       {AIOS LLM Private Endpoint}/v1/embedding \
      -H "Content-Type: application/json" \
      -d '{
        "model": "sds/bge-m3",
        "input": "What is the capital of France?",
    	"encoding_format": "float"
      }'
    Code block. Embedding API Request Example

    Response

    200 OK

    NameTypeDescription
    idstringUnique identifier of the response
    objectstringResponse object’s type (example: “list” )
    creatednumberCreation time (Unix timestamp, in seconds)
    modelstringName of the model used
    dataarrayArray of objects containing embedding results
    data.indexnumberOrder index of the input text (example: indicates the order when multiple input texts are provided)
    data.objectstringData item type
    data.embeddingarrayEmbedding vector values of the input text (sds-bge-m3 consists of a 1024-dimensional float array)
    usageobjectToken usage statistics
    usage.prompt_tokensnumberNumber of tokens used in the input prompt
    usage.total_tokensnumberTotal token count (input + output)
    usage.completion_tokensnumberNumber of tokens used in the generated response
    usage.prompt_tokens_detailsobjectPrompt token details
    Table. Embedding API - 200 OK

    Error Code

    HTTP status codeErrorCode description
    400Bad Request
    422Validation Error
    500Internal Server Error
    Table. Embedding API - Error Code

    Example

    Color mode
    {
      "id":"embd-scp-aios-embeddings",
      "object":"list","created":1749035024,
      "model":"sds/bge-m3",
      "data":[
        {
          "index":0,
          "object":"embedding",
          "embedding":
          [0.01319122314453125,0.057220458984375,-0.028533935546875,-0.0008697509765625,-0.01422119140625,0.033416748046875,-0.0062408447265625,-0.04364013671875,-0.004497528076171875,0.0008072853088378906,-0.0193328857421875,0.041168212890625,-0.019317626953125,-0.0188751220703125,-0.047088623046875,
          -0 ....(omitted)
    
          -0.05706787109375,-0.0147705078125]
        }
      ],
      "usage":
      {
        "prompt_tokens":9,
        "total_tokens":9,
        "completion_tokens":0,
        "prompt_tokens_details":null
      }
    }
    {
      "id":"embd-scp-aios-embeddings",
      "object":"list","created":1749035024,
      "model":"sds/bge-m3",
      "data":[
        {
          "index":0,
          "object":"embedding",
          "embedding":
          [0.01319122314453125,0.057220458984375,-0.028533935546875,-0.0008697509765625,-0.01422119140625,0.033416748046875,-0.0062408447265625,-0.04364013671875,-0.004497528076171875,0.0008072853088378906,-0.0193328857421875,0.041168212890625,-0.019317626953125,-0.0188751220703125,-0.047088623046875,
          -0 ....(omitted)
    
          -0.05706787109375,-0.0147705078125]
        }
      ],
      "usage":
      {
        "prompt_tokens":9,
        "total_tokens":9,
        "completion_tokens":0,
        "prompt_tokens_details":null
      }
    }
    Code block. Embedding API Response Example

    Reference