The page has been translated by Gen AI.

API Reference

API Reference Overview

The API Reference supported by AIOS is as follows.

API NameAPIDetailed Description
Rerank APIPOST /rerank, /v1/rerank, /v2/rerankApplies an embedding model or cross-encoder model to predict the relevance between a single query and each item in a document list.
Score APIPOST /score, /v1/scorePredicts the similarity between two sentences.
Chat Completions APIPOST /v1/chat/completionsCompatible with OpenAI’s Completions API and can be used with the OpenAI Python client.
Completions APIPOST /v1/completionsCompatible with OpenAI’s Completions API and can be used with the OpenAI Python client.
Embedding APIPOST /v1/embeddingsConverts text into a high-dimensional vector (embedding) that can be used for various natural language processing (NLP) tasks, such as calculating text similarity, clustering, and searching.
Table. AIOS Supported API List

Rerank API

POST /rerank, /v1/rerank, /v2/rerank

Overview

The Rerank API applies an embedding model or cross-encoder model to predict the relevance between a single query and each item in a document list. Generally, the score of a sentence pair represents the similarity between the two sentences on a scale of 0 to 1.

  • Embedding-based model: Converts the query and document into vectors and measures the similarity between the vectors (e.g., cosine similarity) to calculate the score.
  • Reranker (Cross-Encoder) based model: Evaluates the query and document as a pair.

Request

Context

KeyTypeDescriptionExample
Base URLstringAIOS URL for API requestsapplication/json
Request MethodstringHTTP method used for API requestsPOST
HeadersobjectHeader information required for requests{ “accept”: “application/json”, “Content-Type”: “application/json” }
Body ParametersobjectParameters included in the request body{ “model”: “sds/bge-m3”, “query”: …, “documents”: […] }
Table. Re-rank API - Context

Path Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Re-rank API - Path Parameters

Query Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Re-rank API - Query Parameters

Body Parameters

NameName SubtypeRequiredDescriptionDefault valueBoundary valueExample
model-stringModel used for response generation“sds/bge-reranker-v2-m3”
query-stringUser’s search query or question“What is the capital of France?"
documents-arrayList of documents to be re-rankedMaximum model input length limit[“The capital of France is Paris.”]
top_n-integerNumber of top documents to return (0 returns all)0> 05
truncate_prompt_tokens-integerLimits the number of input tokens> 0100
Table. Re-rank API - Body Parameters

Example

curl -X 'POST' \
   'https://aios.private.kr-west1.e.samsungsdscloud.com/rerank' \ 
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "sds/bge-reranker-v2-m3",

Here is the translation of the given text:

"query": "What is the capital of France?",
"documents": [
  "The capital of France is Paris.",
  "France capital city is known for the Eiffel Tower.",
  "Paris is located in the north-central part of France."
],
"top_n": 2, 
"truncate_prompt_tokens": 512

}

Response

200 OK

NameTypeDescription
idstringAPI response’s unique identifier (UUID format)
modelstringName of the model that generated the result
usageintegerObject containing information about the resources used in the request
usage.total_tokensintegerTotal number of tokens used in processing the request
resultstringArray containing the results of the query-related documents
results[].indexintegerOrder number in the result array
results[].documentobjectObject containing the content of the searched document
results[].document.textstringActual text content of the searched document
results[].relevance_scorefloatScore indicating the relevance between the query and the document (0 ~ 1)
Table. Re-rank API - 200 OK

Error Code

HTTP status codeError Code Description
400Bad Request
422Validation Error
500Internal Server Error
Table. Re-rank API - Error Code

Example

{
  "id": "rerank-scp-aios-rerank",
  "model": "sds/sds/bge-m3",
  "usage": {
    "total_tokens": 65
  },
  "results": [
    {
      "index": 0,
      "document": {
        "text": "The capital of France is Paris."
      },
      "relevance_score": 0.8291233777999878
    },
    {
      "index": 1,
      "document": {
        "text": "France capital city is known for the Eiffel Tower."
      },
      "relevance_score": 0.6996355652809143
    }
  ]
}

Reference

Score API

POST /score, /v1/score

Overview

The Score API predicts the similarity between two sentences. This API uses one of two models to calculate the score:

  • Reranker (Cross-Encoder) model: Takes a pair of sentences as input and directly predicts the similarity score.
  • Embedding model: Generates embedding vectors for each sentence and calculates the cosine similarity to derive the score.

Request

Context

KeyTypeDescriptionExample
Base URLstringAIOS URL for API requestsapplication/json
Request MethodstringHTTP method used for API requestsPOST
HeadersobjectHeader information required for requests{ “accept”: “application/json”, “Content-Type”: “application/json” }
Body ParametersobjectParameters included in the request body{ “model”: “sds/bge-reranker-v2-m3”, “text_1”: […], “text_2”: […] }
Table. Score API - Context

Path Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Score API - Path Parameters

Query Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Score API - Query Parameters

Body Parameters

NameName SubtypeRequiredDescriptionDefault valueBoundary valueExample
model-stringSpecify the model to use for response generation“sds/bge-reranker-v2-m3”
encoding_format-stringScore return format“float”
  • “float”(default)
  • “int”
“float”
text_1-string, arrayFirst text to compare
  • String ("")
  • Model’s maximum input length limit
“What is the capital of France?"
text_2-string, arraySecond text to compare
  • String (”")
  • Model’s maximum input length limit
[“The capital of France is Paris.”, ]
truncate_prompt_tokens-integerLimit input token count> 0100
Table. Score API - Body Parameters

Example

curl -X 'POST' \
  'https://aios.private.kr-west1.e.samsungsdscloud.com/score' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "sds/bge-reranker-v2-m3",
  "encoding_format": "float",
"text_1": [
  "What is the largest planet in the solar system?",
  "What is the chemical symbol for water?"
],
"text_2": [
  "Jupiter is the largest planet in the solar system.",
  "The chemical symbol for water is H₂O."
]
}'

Response

200 OK

NameTypeDescription
idstringUnique identifier for the response
objectstringType of response object (e.g., “list” )
createdintegerCreation time (Unix timestamp, seconds)
modelstringName of the model used
dataarrayList of score calculation results
data.indexintegerIndex of the item in the data array
data.objectstringType of data item (e.g., “score”)
data.scorenumberCalculated score value, normalized to 0 ~ 1
usageobjectToken usage statistics
usage.prompt_tokensintegerNumber of tokens used in the input prompt
usage.total_tokensintegerTotal number of tokens (input + output)
usage.completion_tokensintegerNumber of tokens used in the generated response
usage.prompt_tokens_detailsnullDetailed information about prompt tokens
Table. Score API - 200 OK

Error Code

HTTP status codeError Code Description
400Bad Request
422Validation Error
500Internal Server Error
Table. Score API - Error Code

Example

{
  "id": "score-scp-aios-score",
  "object": "list",
  "created": 1748574112,
  "model": "sds/bge-reranker-v2-m3",
  "data": [
    {

Here is the translated text:

  "index": 0,
  "object": "score",
  "score": 1.0
},
{
  "index": 1,
  "object": "score",
  "score": 1.0
}

], “usage”: { “prompt_tokens”: 53, “total_tokens”: 53, “completion_tokens”: 0, “prompt_tokens_details”: null } }


## Reference
* [Score API vLLM documentation](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#score-api_1)




# Chat Completions API

```python
POST /v1/chat/completions

Overview

Chat Completions API is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.

Request

Context

KeyTypeDescriptionExample
Content-Typestringapplication/json
Table. Chat Completions API - Context

Path Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Chat Completions API - Path Parameters

Query Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Chat Completions API - Query Parameters

Body Parameters

NameName SubtypeRequiredDescriptionDefault valueBoundary valueExample
model-stringSpecifies the model to use for generating responses“meta-llama/Llama-3.3-70B-Instruct”
messagesrolestringList of messages containing conversation history[ { “role” : “user” , “content” : “message” }]
frequency_penalty-numberAdjusts the penalty for repeating tokens0-2.0 ~ 2.00.5
logit_bias-objectAdjusts the probability of specific tokens (e.g., { “100”: 2.0 })nullKey: token ID, Value: -100 ~ 100{ “100”: 2.0 }
logprobs-booleanReturns the probabilities of the top logprobs number of tokensfalsetrue, falsetrue
max_completion_tokens-integerLimits the maximum number of generated tokensNone0 ~ model maximum100
max_tokens (Deprecated)-integerLimits the maximum number of generated tokensNone0 ~ model maximum100
n-integerSpecifies the number of responses to generate13
presence_penalty-numberAdjusts the penalty for tokens already present in the text0-2.0 ~ 2.01.0
seed-integerSpecifies the seed value for controlling randomnessNone
stop-string / array / nullStops generating when a specific string is encounterednull"\n"
stream-booleanReturns the result in streaming modefalsetrue/falsetrue
stream_optionsinclude_usage, continuous_usage_statsobjectControls streaming options (e.g., including usage statistics)null{ “include_usage”: true }
temperature-numberAdjusts the creativity of the generated response (higher means more random)10.0 ~ 1.00.7
tool_choice-stringSpecifies which tool to call
  • none: Does not call any tool
  • auto: Model decides whether to call a tool or generate a message
  • required: Model calls at least one tool
  • No tool: none
  • With tool: auto
tools-arrayList of tools that the model can call
  • Only functions are supported as tools
  • Supports up to 128 functions
None
top_logprobs-integerSpecifies the number of top logprobs tokens to return (between 0 and 20)
  • Each is associated with a log probability value
  • logprobs must be set to true
  • Shows the probability values for the top k completions
None0 ~ 203
top_p-numberLimits the sampling probability of tokens (higher means more tokens are considered)10.0 ~ 1.00.9
Table. Chat Completions API - Body Parameters

Example

curl -X 'POST' \
   'https://aios.private.kr-west1.e.samsungsdscloud.com/v1/chat/completions' \
  -H 'accept: application/json' \

200 OK

NameTypeDescription
idstringResponse’s unique identifier
objectstringType of response object (e.g., “chat.completion”)
createdintegerCreation time (Unix timestamp, in seconds)
modelstringName of the model used
choicesarrayList of generated response choices
choices[].indexintegerIndex of the choice
choices[].messageobjectGenerated message object
choices[].message.rolestringRole of the message author (e.g., “assistant”)
choices[].message.contentstringActual content of the generated message
choices[].message.reasoning_contentstringActual content of the generated reasoning message
choices[].message.tool_callsarray (optional)Tool call information (may be included depending on the model/settings)
choices[].finish_reasonstring or nullReason why the response was terminated (e.g., “stop”, “length”, etc.)
choices[].stop_reasonobject or nullAdditional termination reason details
choices[].logprobsobject or nullToken-wise log probability information (may be included depending on the settings)
usageobjectToken usage statistics
usage.prompt_tokensintegerNumber of tokens used in the input prompt
usage.completion_tokensintegerNumber of tokens used in the generated response
usage.total_tokensintegerTotal number of tokens (input + output)
Table. Chat Completions API - 200 OK

Error Code

HTTP status codeError Code Description
400Bad Request
422Validation Error
500Internal Server Error
Table. Chat Completions API - Error Code

Example

{
  "id": "chatcmpl-scp-aios-chat-completions",
  "object": "chat.completion",
  "created": 1749702816,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "The capital of Korea is Seoul.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 54,
    "total_tokens": 62,
    "completion_tokens": 8,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

Reference


## Overview

Completions API is compatible with OpenAI's Completions API and can be used with the OpenAI Python client.


## Request 

### Context






Key Type Description Example
Base URL string API request URL for AIOS application/json
Request Method string HTTP method used for the API request POST
Headers object Header information required for the request { “accept”: “application/json”, “Content-Type”: “application/json” }
Body Parameters object Parameters included in the request body ’{“model”: “meta-llama/Llama-3.3-70B-Instruct”, “prompt” : “hello”, “stream”: “true”}’
Table. Completions API - Context
### Path Parameters
Name type Required Description Default value Boundary value Example
None
Table. Completions API - Path Parameters
### Query Parameters
Name type Required Description Default value Boundary value Example
None
Table. Completions API - Query Parameters
### Body Parameters
Name Name Sub type Required Description Default value Boundary value Example
model - string Model used to generate the response “meta-llama/Llama-3.3-70B-Instruct”
prompt - array, string User input text ""
echo - boolean Whether to include the input text in the output false true/false true
frequency_penalty - number Adjust the penalty for repeating tokens 0 -2.0 ~ 2.0 0.5
logit_bias - object Adjust the probability of specific tokens (e.g., { “100”: 2.0 }) null Key: token ID, Value: -100~100 { “100”: 2.0 }
logprobs - integer Return the probabilities of the top logprobs tokens null 1 ~ 5 5
max_completion_tokens - integer Limit the maximum number of generated tokens None 0~model maximum value 100
max_tokens (Deprecated) - integer Limit the maximum number of generated tokens None 0~model maximum value 100
n - integer Specify the number of responses to generate 1 3
presence_penalty - number Adjust the penalty for tokens already present in the text 0 -2.0 ~ 2.0 1.0
seed - integer Specify a seed value for randomness control None
stop - string / array / null Stop generating when a specific string is encountered null "\n"
stream - boolean Whether to return the results in a streaming manner false true/false true
stream_options include_usage, continuous_usage_stats object Control streaming options (e.g., include usage statistics) null { “include_usage”: true }
temperature - number Control the creativity of the generated response (higher means more random) 1 0.0 ~ 1.0 0.7
top_p - number Limit the sampling probability of tokens (higher means more tokens considered) 1 0.0 ~ 1.0 0.9
Table. Completions API - Body Parameters
### Example ```python curl -X 'POST' \ 'https://aios.private.kr-west1.e.samsungsdscloud.com/v1/completions' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "model": "meta-llama/Meta-Llama-3.3-70B-Instruct", "prompt": "What is the capital of Korea?", "temperature": 0.7 }'

Response

200 OK

NameTypeDescription
idstringUnique identifier of the response
objectstringType of the response object (e.g., “text_completion”)
createdintegerCreation time (Unix timestamp, seconds)
modelstringName of the model used
choicesarrayList of generated response choices
choices[].indexnumberIndex of the choice
choices[].textstringGenerated text object
choices[].logprobsobjectToken-wise log probability information (included based on settings)
choices[].finish_reasonstring or nullReason why the response was terminated (e.g., “stop”, “length” etc.)
choices[].stop_reasonobject or nullAdditional termination reason details
choices[].prompt_logprobsobject or nullLog probability of input prompt tokens (may be null)
usageobjectToken usage statistics
usage.prompt_tokensnumberNumber of tokens used in the input prompt
usage.total_tokensnumberTotal number of tokens (input + output)
| usage.completion_tokens	| number		| Number of tokens used in the generated response |
| usage.prompt_tokens_details	| object		| Details of prompt token usage |
<div class="figure-caption">
  Table. Completions API - 200 OK
</div>

Error Code

HTTP status codeError Code Description
400Bad Request
422Validation Error
500Internal Server Error
Table. Completions API - Error Code

Example

{
  "id": "cmpl-scp-aios-completions",
  "object": "text_completion",
  "created": 1749702612,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "text": " \nOur capital city is Seoul. \n\nA. 1\nB. ",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "prompt_logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 25,
    "completion_tokens": 16,
    "prompt_tokens_details": null
  }
}

Reference

Embedding API

POST /v1/embeddings

Overview

The Embedding API converts text into high-dimensional vectors (embeddings) that can be used for various natural language processing (NLP) tasks, such as calculating text similarity, clustering, and search.

Request

Context

KeyTypeDescriptionExample
Base URLstringURL for AIOS API requestsapplication/json
Request MethodstringHTTP method used for API requestsPOST
HeadersobjectHeader information required for requests{ “accept”: “application/json”, “Content-Type”: “application/json” }
Body ParametersobjectParameters included in the request body{ “model”: “sds/bge-m3”, “input”: “What is the capital of France?”}
Table. Embedding API - Context

Path Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Embedding API - Path Parameters

Query Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Embedding API - Query Parameters

Body Parameters

NameName SubtypeRequiredDescriptionDefault valueBoundary valueExample
model-stringSpecify the model to use for generating responses“sds/bge-reranker-v2-m3”
input-array<stringUser’s search query or question“What is the capital of France?"
encoding_format-stringSpecify the format to return the embedding“float”“float”, “base64”[0.01319122314453125,0.057220458984375, … (omitted)
truncate_prompt_tokens-integerLimit the number of input tokens> 0100
Table. Embedding API - Body Parameters

Example

curl -X 'POST' \
   'https://aios.private.kr-west1.e.samsungsdscloud.com/v1/embedding' \ 
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "sds/bge-m3",
    "input": "What is the capital of France?",
	"encoding_format": "float"
  }'

Response

200 OK

NameTypeDescription
idstringUnique identifier of the response
objectstringType of the response object (e.g., “list”)
creatednumberCreation time (Unix timestamp, seconds)
modelstringName of the model used
dataarrayArray of objects containing embedding results
data.indexnumberIndex of the input text (e.g., order of input texts)
data.objectstringType of data item
data.embeddingarrayEmbedding vector values of the input text (sds-bge-m3 is a 1024-dimensional float array)
usageobjectToken usage statistics
usage.prompt_tokensnumberNumber of tokens used in the input prompt
usage.total_tokensnumberTotal number of tokens (input + output)
usage.completion_tokensnumberNumber of tokens used in the generated response
usage.prompt_tokens_detailsobjectDetailed information about prompt tokens
Table. Embedding API - 200 OK

Error Code

HTTP status codeError Code Description
400Bad Request
422Validation Error
500Internal Server Error
Table. Embedding API - Error Code

Example

{
  "id":"embd-scp-aios-embeddings",
  "object":"list","created":1749035024,
  "model":"sds/bge-m3",
  "data":[
    {
      "index":0,
      "object":"embedding",
      "embedding":
      [0.01319122314453125,0.057220458984375,-0.028533935546875,-0.0008697509765625,-0.01422119140625,0.033416748046875,-0.0062408447265625,-0.04364013671875,-0.004497528076171875,0.0008072853088378906,-0.0193328857421875,0.041168212890625,-0.019317626953125,-0.0188751220703125,-0.047088623046875,
      -0 ....(omitted)

      -0.05706787109375,-0.0147705078125]
    }
  ],
  "usage":
  {
    "prompt_tokens":9,
    "total_tokens":9,
    "completion_tokens":0,
    "prompt_tokens_details":null
  }
}

Reference

References
SDK Reference