API Reference Overview
The API Reference supported by AIOS is as follows.
| API Name | API | Detailed Description |
|---|
| Rerank API | POST /rerank, /v1/rerank, /v2/rerank | Applies an embedding model or cross-encoder model to predict the relevance between a single query and each item in a document list. |
| Score API | POST /score, /v1/score | Predicts the similarity between two sentences. |
| Chat Completions API | POST /v1/chat/completions | Compatible with OpenAI’s Completions API and can be used with the OpenAI Python client. |
| Completions API | POST /v1/completions | Compatible with OpenAI’s Completions API and can be used with the OpenAI Python client. |
| Embedding API | POST /v1/embeddings | Converts text into a high-dimensional vector (embedding) that can be used for various natural language processing (NLP) tasks, such as calculating text similarity, clustering, and searching. |
Table. AIOS Supported API List
Rerank API
POST /rerank, /v1/rerank, /v2/rerank
Overview
The Rerank API applies an embedding model or cross-encoder model to predict the relevance between a single query and each item in a document list.
Generally, the score of a sentence pair represents the similarity between the two sentences on a scale of 0 to 1.
- Embedding-based model: Converts the query and document into vectors and measures the similarity between the vectors (e.g., cosine similarity) to calculate the score.
- Reranker (Cross-Encoder) based model: Evaluates the query and document as a pair.
Request
Context
| Key | Type | Description | Example |
|---|
| Base URL | string | AIOS URL for API requests | application/json |
| Request Method | string | HTTP method used for API requests | POST |
| Headers | object | Header information required for requests | { “accept”: “application/json”, “Content-Type”: “application/json” } |
| Body Parameters | object | Parameters included in the request body | { “model”: “sds/bge-m3”, “query”: …, “documents”: […] } |
Table. Re-rank API - Context
Path Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Re-rank API - Path Parameters
Query Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Re-rank API - Query Parameters
Body Parameters
| Name | Name Sub | type | Required | Description | Default value | Boundary value | Example |
|---|
| model | - | string | ✅ | Model used for response generation | | | “sds/bge-reranker-v2-m3” |
| query | - | string | ✅ | User’s search query or question | | | “What is the capital of France?" |
| documents | - | array | ✅ | List of documents to be re-ranked | | Maximum model input length limit | [“The capital of France is Paris.”] |
| top_n | - | integer | ❌ | Number of top documents to return (0 returns all) | 0 | > 0 | 5 |
| truncate_prompt_tokens | - | integer | ❌ | Limits the number of input tokens | | > 0 | 100 |
Table. Re-rank API - Body Parameters
Example
curl -X 'POST' \
'https://aios.private.kr-west1.e.samsungsdscloud.com/rerank' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "sds/bge-reranker-v2-m3",
Here is the translation of the given text:
"query": "What is the capital of France?",
"documents": [
"The capital of France is Paris.",
"France capital city is known for the Eiffel Tower.",
"Paris is located in the north-central part of France."
],
"top_n": 2,
"truncate_prompt_tokens": 512
}
Response
200 OK
| Name | Type | Description |
|---|
| id | string | API response’s unique identifier (UUID format) |
| model | string | Name of the model that generated the result |
| usage | integer | Object containing information about the resources used in the request |
| usage.total_tokens | integer | Total number of tokens used in processing the request |
| result | string | Array containing the results of the query-related documents |
| results[].index | integer | Order number in the result array |
| results[].document | object | Object containing the content of the searched document |
| results[].document.text | string | Actual text content of the searched document |
| results[].relevance_score | float | Score indicating the relevance between the query and the document (0 ~ 1) |
Table. Re-rank API - 200 OK
Error Code
| HTTP status code | Error Code Description |
|---|
| 400 | Bad Request |
| 422 | Validation Error |
| 500 | Internal Server Error |
Table. Re-rank API - Error Code
Example
{
"id": "rerank-scp-aios-rerank",
"model": "sds/sds/bge-m3",
"usage": {
"total_tokens": 65
},
"results": [
{
"index": 0,
"document": {
"text": "The capital of France is Paris."
},
"relevance_score": 0.8291233777999878
},
{
"index": 1,
"document": {
"text": "France capital city is known for the Eiffel Tower."
},
"relevance_score": 0.6996355652809143
}
]
}
Reference
Score API
Overview
The Score API predicts the similarity between two sentences. This API uses one of two models to calculate the score:
- Reranker (Cross-Encoder) model: Takes a pair of sentences as input and directly predicts the similarity score.
- Embedding model: Generates embedding vectors for each sentence and calculates the cosine similarity to derive the score.
Request
Context
| Key | Type | Description | Example |
|---|
| Base URL | string | AIOS URL for API requests | application/json |
| Request Method | string | HTTP method used for API requests | POST |
| Headers | object | Header information required for requests | { “accept”: “application/json”, “Content-Type”: “application/json” } |
| Body Parameters | object | Parameters included in the request body | { “model”: “sds/bge-reranker-v2-m3”, “text_1”: […], “text_2”: […] } |
Table. Score API - Context
Path Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Score API - Path Parameters
Query Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Score API - Query Parameters
Body Parameters
| Name | Name Sub | type | Required | Description | Default value | Boundary value | Example |
|---|
| model | - | string | ✅ | Specify the model to use for response generation | | | “sds/bge-reranker-v2-m3” |
| encoding_format | - | string | ❌ | Score return format | “float” | | “float” |
| text_1 | - | string, array | ✅ | First text to compare | | - Model’s maximum input length limit
| “What is the capital of France?" |
| text_2 | - | string, array | ✅ | Second text to compare | | - Model’s maximum input length limit
| [“The capital of France is Paris.”, ] |
| truncate_prompt_tokens | - | integer | ❌ | Limit input token count | | > 0 | 100 |
Table. Score API - Body Parameters
Example
curl -X 'POST' \
'https://aios.private.kr-west1.e.samsungsdscloud.com/score' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "sds/bge-reranker-v2-m3",
"encoding_format": "float",
"text_1": [
"What is the largest planet in the solar system?",
"What is the chemical symbol for water?"
],
"text_2": [
"Jupiter is the largest planet in the solar system.",
"The chemical symbol for water is H₂O."
]
}'
Response
200 OK
| Name | Type | Description |
|---|
| id | string | Unique identifier for the response |
| object | string | Type of response object (e.g., “list” ) |
| created | integer | Creation time (Unix timestamp, seconds) |
| model | string | Name of the model used |
| data | array | List of score calculation results |
| data.index | integer | Index of the item in the data array |
| data.object | string | Type of data item (e.g., “score”) |
| data.score | number | Calculated score value, normalized to 0 ~ 1 |
| usage | object | Token usage statistics |
| usage.prompt_tokens | integer | Number of tokens used in the input prompt |
| usage.total_tokens | integer | Total number of tokens (input + output) |
| usage.completion_tokens | integer | Number of tokens used in the generated response |
| usage.prompt_tokens_details | null | Detailed information about prompt tokens |
Table. Score API - 200 OK
Error Code
| HTTP status code | Error Code Description |
|---|
| 400 | Bad Request |
| 422 | Validation Error |
| 500 | Internal Server Error |
Table. Score API - Error Code
Example
{
"id": "score-scp-aios-score",
"object": "list",
"created": 1748574112,
"model": "sds/bge-reranker-v2-m3",
"data": [
{
Here is the translated text:
"index": 0,
"object": "score",
"score": 1.0
},
{
"index": 1,
"object": "score",
"score": 1.0
}
],
“usage”: {
“prompt_tokens”: 53,
“total_tokens”: 53,
“completion_tokens”: 0,
“prompt_tokens_details”: null
}
}
## Reference
* [Score API vLLM documentation](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#score-api_1)
# Chat Completions API
```python
POST /v1/chat/completions
Overview
Chat Completions API is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.
Request
Context
| Key | Type | Description | Example |
|---|
| Content-Type | string | | application/json |
Table. Chat Completions API - Context
Path Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Chat Completions API - Path Parameters
Query Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Chat Completions API - Query Parameters
Body Parameters
| Name | Name Sub | type | Required | Description | Default value | Boundary value | Example |
|---|
| model | - | string | ✅ | Specifies the model to use for generating responses | | | “meta-llama/Llama-3.3-70B-Instruct” |
| messages | role | string | ✅ | List of messages containing conversation history | | | [ { “role” : “user” , “content” : “message” }] |
| frequency_penalty | - | number | ❌ | Adjusts the penalty for repeating tokens | 0 | -2.0 ~ 2.0 | 0.5 |
| logit_bias | - | object | ❌ | Adjusts the probability of specific tokens (e.g., { “100”: 2.0 }) | null | Key: token ID, Value: -100 ~ 100 | { “100”: 2.0 } |
| logprobs | - | boolean | ❌ | Returns the probabilities of the top logprobs number of tokens | false | true, false | true |
| max_completion_tokens | - | integer | ❌ | Limits the maximum number of generated tokens | None | 0 ~ model maximum | 100 |
| max_tokens (Deprecated) | - | integer | ❌ | Limits the maximum number of generated tokens | None | 0 ~ model maximum | 100 |
| n | - | integer | ❌ | Specifies the number of responses to generate | 1 | | 3 |
| presence_penalty | - | number | ❌ | Adjusts the penalty for tokens already present in the text | 0 | -2.0 ~ 2.0 | 1.0 |
| seed | - | integer | ❌ | Specifies the seed value for controlling randomness | None | | |
| stop | - | string / array / null | ❌ | Stops generating when a specific string is encountered | null | | "\n" |
| stream | - | boolean | ❌ | Returns the result in streaming mode | false | true/false | true |
| stream_options | include_usage, continuous_usage_stats | object | ❌ | Controls streaming options (e.g., including usage statistics) | null | | { “include_usage”: true } |
| temperature | - | number | ❌ | Adjusts the creativity of the generated response (higher means more random) | 1 | 0.0 ~ 1.0 | 0.7 |
| tool_choice | - | string | ❌ | Specifies which tool to call- none: Does not call any tool
- auto: Model decides whether to call a tool or generate a message
- required: Model calls at least one tool
| | | |
| tools | - | array | ❌ | List of tools that the model can call- Only functions are supported as tools
- Supports up to 128 functions
| None | | |
| top_logprobs | - | integer | ❌ | Specifies the number of top logprobs tokens to return (between 0 and 20)- Each is associated with a log probability value
- logprobs must be set to true
- Shows the probability values for the top k completions
| None | 0 ~ 20 | 3 |
| top_p | - | number | ❌ | Limits the sampling probability of tokens (higher means more tokens are considered) | 1 | 0.0 ~ 1.0 | 0.9 |
Table. Chat Completions API - Body Parameters
Example
curl -X 'POST' \
'https://aios.private.kr-west1.e.samsungsdscloud.com/v1/chat/completions' \
-H 'accept: application/json' \
200 OK
| Name | Type | Description |
|---|
| id | string | Response’s unique identifier |
| object | string | Type of response object (e.g., “chat.completion”) |
| created | integer | Creation time (Unix timestamp, in seconds) |
| model | string | Name of the model used |
| choices | array | List of generated response choices |
| choices[].index | integer | Index of the choice |
| choices[].message | object | Generated message object |
| choices[].message.role | string | Role of the message author (e.g., “assistant”) |
| choices[].message.content | string | Actual content of the generated message |
| choices[].message.reasoning_content | string | Actual content of the generated reasoning message |
| choices[].message.tool_calls | array (optional) | Tool call information (may be included depending on the model/settings) |
| choices[].finish_reason | string or null | Reason why the response was terminated (e.g., “stop”, “length”, etc.) |
| choices[].stop_reason | object or null | Additional termination reason details |
| choices[].logprobs | object or null | Token-wise log probability information (may be included depending on the settings) |
| usage | object | Token usage statistics |
| usage.prompt_tokens | integer | Number of tokens used in the input prompt |
| usage.completion_tokens | integer | Number of tokens used in the generated response |
| usage.total_tokens | integer | Total number of tokens (input + output) |
Table. Chat Completions API - 200 OK
Error Code
| HTTP status code | Error Code Description |
|---|
| 400 | Bad Request |
| 422 | Validation Error |
| 500 | Internal Server Error |
Table. Chat Completions API - Error Code
Example
{
"id": "chatcmpl-scp-aios-chat-completions",
"object": "chat.completion",
"created": 1749702816,
"model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"reasoning_content": null,
"content": "The capital of Korea is Seoul.",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 54,
"total_tokens": 62,
"completion_tokens": 8,
"prompt_tokens_details": null
},
"prompt_logprobs": null
}
Reference
## Overview
Completions API is compatible with OpenAI's Completions API and can be used with the OpenAI Python client.
## Request
### Context
| Key |
Type |
Description |
Example |
| Base URL |
string |
API request URL for AIOS |
application/json |
| Request Method |
string |
HTTP method used for the API request |
POST |
| Headers |
object |
Header information required for the request |
{ “accept”: “application/json”, “Content-Type”: “application/json” } |
| Body Parameters |
object |
Parameters included in the request body |
’{“model”: “meta-llama/Llama-3.3-70B-Instruct”, “prompt” : “hello”, “stream”: “true”}’ |
Table. Completions API - Context
### Path Parameters
| Name |
type |
Required |
Description |
Default value |
Boundary value |
Example |
| None |
|
|
|
|
|
|
Table. Completions API - Path Parameters
### Query Parameters
| Name |
type |
Required |
Description |
Default value |
Boundary value |
Example |
| None |
|
|
|
|
|
|
Table. Completions API - Query Parameters
### Body Parameters
| Name |
Name Sub |
type |
Required |
Description |
Default value |
Boundary value |
Example |
| model |
- |
string |
✅ |
Model used to generate the response |
|
|
“meta-llama/Llama-3.3-70B-Instruct” |
| prompt |
- |
array, string |
✅ |
User input text |
|
|
"" |
| echo |
- |
boolean |
❌ |
Whether to include the input text in the output |
false |
true/false |
true |
| frequency_penalty |
- |
number |
❌ |
Adjust the penalty for repeating tokens |
0 |
-2.0 ~ 2.0 |
0.5 |
| logit_bias |
- |
object |
❌ |
Adjust the probability of specific tokens (e.g., { “100”: 2.0 }) |
null |
Key: token ID, Value: -100~100 |
{ “100”: 2.0 } |
| logprobs |
- |
integer |
❌ |
Return the probabilities of the top logprobs tokens |
null |
1 ~ 5 |
5 |
| max_completion_tokens |
- |
integer |
❌ |
Limit the maximum number of generated tokens |
None |
0~model maximum value |
100 |
| max_tokens (Deprecated) |
- |
integer |
❌ |
Limit the maximum number of generated tokens |
None |
0~model maximum value |
100 |
| n |
- |
integer |
❌ |
Specify the number of responses to generate |
1 |
|
3 |
| presence_penalty |
- |
number |
❌ |
Adjust the penalty for tokens already present in the text |
0 |
-2.0 ~ 2.0 |
1.0 |
| seed |
- |
integer |
❌ |
Specify a seed value for randomness control |
None |
|
|
| stop |
- |
string / array / null |
❌ |
Stop generating when a specific string is encountered |
null |
|
"\n" |
| stream |
- |
boolean |
❌ |
Whether to return the results in a streaming manner |
false |
true/false |
true |
| stream_options |
include_usage, continuous_usage_stats |
object |
❌ |
Control streaming options (e.g., include usage statistics) |
null |
|
{ “include_usage”: true } |
| temperature |
- |
number |
❌ |
Control the creativity of the generated response (higher means more random) |
1 |
0.0 ~ 1.0 |
0.7 |
| top_p |
- |
number |
❌ |
Limit the sampling probability of tokens (higher means more tokens considered) |
1 |
0.0 ~ 1.0 |
0.9 |
Table. Completions API - Body Parameters
### Example
```python
curl -X 'POST' \
'https://aios.private.kr-west1.e.samsungsdscloud.com/v1/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
"prompt": "What is the capital of Korea?",
"temperature": 0.7
}'
Response
200 OK
| Name | Type | Description |
|---|
| id | string | Unique identifier of the response |
| object | string | Type of the response object (e.g., “text_completion”) |
| created | integer | Creation time (Unix timestamp, seconds) |
| model | string | Name of the model used |
| choices | array | List of generated response choices |
| choices[].index | number | Index of the choice |
| choices[].text | string | Generated text object |
| choices[].logprobs | object | Token-wise log probability information (included based on settings) |
| choices[].finish_reason | string or null | Reason why the response was terminated (e.g., “stop”, “length” etc.) |
| choices[].stop_reason | object or null | Additional termination reason details |
| choices[].prompt_logprobs | object or null | Log probability of input prompt tokens (may be null) |
| usage | object | Token usage statistics |
| usage.prompt_tokens | number | Number of tokens used in the input prompt |
| usage.total_tokens | number | Total number of tokens (input + output) |
| usage.completion_tokens | number | Number of tokens used in the generated response |
| usage.prompt_tokens_details | object | Details of prompt token usage |
<div class="figure-caption">
Table. Completions API - 200 OK
</div>
Error Code
| HTTP status code | Error Code Description |
|---|
| 400 | Bad Request |
| 422 | Validation Error |
| 500 | Internal Server Error |
Table. Completions API - Error Code
Example
{
"id": "cmpl-scp-aios-completions",
"object": "text_completion",
"created": 1749702612,
"model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
"choices": [
{
"index": 0,
"text": " \nOur capital city is Seoul. \n\nA. 1\nB. ",
"logprobs": null,
"finish_reason": "length",
"stop_reason": null,
"prompt_logprobs": null
}
],
"usage": {
"prompt_tokens": 9,
"total_tokens": 25,
"completion_tokens": 16,
"prompt_tokens_details": null
}
}
Reference
Embedding API
Overview
The Embedding API converts text into high-dimensional vectors (embeddings) that can be used for various natural language processing (NLP) tasks, such as calculating text similarity, clustering, and search.
Request
Context
| Key | Type | Description | Example |
|---|
| Base URL | string | URL for AIOS API requests | application/json |
| Request Method | string | HTTP method used for API requests | POST |
| Headers | object | Header information required for requests | { “accept”: “application/json”, “Content-Type”: “application/json” } |
| Body Parameters | object | Parameters included in the request body | { “model”: “sds/bge-m3”, “input”: “What is the capital of France?”} |
Table. Embedding API - Context
Path Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Embedding API - Path Parameters
Query Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Embedding API - Query Parameters
Body Parameters
| Name | Name Sub | type | Required | Description | Default value | Boundary value | Example |
|---|
| model | - | string | ✅ | Specify the model to use for generating responses | | | “sds/bge-reranker-v2-m3” |
| input | - | array<string | ✅ | User’s search query or question | | | “What is the capital of France?" |
| encoding_format | - | string | ❌ | Specify the format to return the embedding | “float” | “float”, “base64” | [0.01319122314453125,0.057220458984375, … (omitted) |
| truncate_prompt_tokens | - | integer | ❌ | Limit the number of input tokens | | > 0 | 100 |
Table. Embedding API - Body Parameters
Example
curl -X 'POST' \
'https://aios.private.kr-west1.e.samsungsdscloud.com/v1/embedding' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "sds/bge-m3",
"input": "What is the capital of France?",
"encoding_format": "float"
}'
Response
200 OK
| Name | Type | Description |
|---|
| id | string | Unique identifier of the response |
| object | string | Type of the response object (e.g., “list”) |
| created | number | Creation time (Unix timestamp, seconds) |
| model | string | Name of the model used |
| data | array | Array of objects containing embedding results |
| data.index | number | Index of the input text (e.g., order of input texts) |
| data.object | string | Type of data item |
| data.embedding | array | Embedding vector values of the input text (sds-bge-m3 is a 1024-dimensional float array) |
| usage | object | Token usage statistics |
| usage.prompt_tokens | number | Number of tokens used in the input prompt |
| usage.total_tokens | number | Total number of tokens (input + output) |
| usage.completion_tokens | number | Number of tokens used in the generated response |
| usage.prompt_tokens_details | object | Detailed information about prompt tokens |
Table. Embedding API - 200 OK
Error Code
| HTTP status code | Error Code Description |
|---|
| 400 | Bad Request |
| 422 | Validation Error |
| 500 | Internal Server Error |
Table. Embedding API - Error Code
Example
{
"id":"embd-scp-aios-embeddings",
"object":"list","created":1749035024,
"model":"sds/bge-m3",
"data":[
{
"index":0,
"object":"embedding",
"embedding":
[0.01319122314453125,0.057220458984375,-0.028533935546875,-0.0008697509765625,-0.01422119140625,0.033416748046875,-0.0062408447265625,-0.04364013671875,-0.004497528076171875,0.0008072853088378906,-0.0193328857421875,0.041168212890625,-0.019317626953125,-0.0188751220703125,-0.047088623046875,
-0 ....(omitted)
-0.05706787109375,-0.0147705078125]
}
],
"usage":
{
"prompt_tokens":9,
"total_tokens":9,
"completion_tokens":0,
"prompt_tokens_details":null
}
}
Reference