This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

API Reference

API Reference Overview

The API Reference supported by AIOS is as follows.

API Name	API	Detailed Description
Rerank API	POST /rerank, /v1/rerank, /v2/rerank	Applies an embedding model or cross-encoder model to predict the relevance between a single query and each item in a document list.
Score API	POST /score, /v1/score	Predicts the similarity between two sentences.
Chat Completions API	POST /v1/chat/completions	Compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.
Completions API	POST /v1/completions	Compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.
Embedding API	POST /v1/embeddings	Converts text into a high-dimensional vector (embedding) that can be used for various natural language processing (NLP) tasks, such as calculating text similarity, clustering, and searching.

Table. AIOS Supported API List

Rerank API

POST /rerank, /v1/rerank, /v2/rerank

Overview

The Rerank API applies an embedding model or cross-encoder model to predict the relevance between a single query and each item in a document list. Generally, the score of a sentence pair represents the similarity between the two sentences on a scale of 0 to 1.

Embedding-based model: Converts the query and document into vectors and measures the similarity between the vectors (e.g., cosine similarity) to calculate the score.
Reranker (Cross-Encoder) based model: Evaluates the query and document as a pair.

Request

Context

Key	Type	Description	Example
Base URL	string	AIOS URL for API requests	`application/json`
Request Method	string	HTTP method used for API requests	`POST`
Headers	object	Header information required for requests	`{ “accept”: “application/json”, “Content-Type”: “application/json” }`
Body Parameters	object	Parameters included in the request body	`{ “model”: “sds/bge-m3”, “query”: …, “documents”: […] }`

Table. Re-rank API - Context

Path Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Re-rank API - Path Parameters

Query Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Re-rank API - Query Parameters

Body Parameters

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	Model used for response generation			`“sds/bge-reranker-v2-m3”`
query	-	string	✅	User’s search query or question			`“What is the capital of France?"`
documents	-	array	✅	List of documents to be re-ranked		Maximum model input length limit	`[“The capital of France is Paris.”]`
top_n	-	integer	❌	Number of top documents to return (0 returns all)	0	> 0	`5`
truncate_prompt_tokens	-	integer	❌	Limits the number of input tokens		> 0	`100`

Table. Re-rank API - Body Parameters

Example

curl -X 'POST' \
   'https://aios.private.kr-west1.e.samsungsdscloud.com/rerank' \ 
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "sds/bge-reranker-v2-m3",

Here is the translation of the given text:

"query": "What is the capital of France?",
"documents": [
  "The capital of France is Paris.",
  "France capital city is known for the Eiffel Tower.",
  "Paris is located in the north-central part of France."
],
"top_n": 2, 
"truncate_prompt_tokens": 512

}

Response

200 OK

Name	Type	Description
id	string	API response’s unique identifier (UUID format)
model	string	Name of the model that generated the result
usage	integer	Object containing information about the resources used in the request
usage.total_tokens	integer	Total number of tokens used in processing the request
result	string	Array containing the results of the query-related documents
results[].index	integer	Order number in the result array
results[].document	object	Object containing the content of the searched document
results[].document.text	string	Actual text content of the searched document
results[].relevance_score	float	Score indicating the relevance between the query and the document (0 ~ 1)

Table. Re-rank API - 200 OK

Error Code

HTTP status code	Error Code Description
400	Bad Request
422	Validation Error
500	Internal Server Error

Table. Re-rank API - Error Code

Example

{
  "id": "rerank-scp-aios-rerank",
  "model": "sds/sds/bge-m3",
  "usage": {
    "total_tokens": 65
  },
  "results": [
    {
      "index": 0,
      "document": {
        "text": "The capital of France is Paris."
      },
      "relevance_score": 0.8291233777999878
    },
    {
      "index": 1,
      "document": {
        "text": "France capital city is known for the Eiffel Tower."
      },
      "relevance_score": 0.6996355652809143
    }
  ]
}

Reference

Rerank API vLLM documentation

Score API

POST /score, /v1/score

Overview

The Score API predicts the similarity between two sentences. This API uses one of two models to calculate the score:

Reranker (Cross-Encoder) model: Takes a pair of sentences as input and directly predicts the similarity score.
Embedding model: Generates embedding vectors for each sentence and calculates the cosine similarity to derive the score.

Request

Context

Key	Type	Description	Example
Base URL	string	AIOS URL for API requests	`application/json`
Request Method	string	HTTP method used for API requests	`POST`
Headers	object	Header information required for requests	`{ “accept”: “application/json”, “Content-Type”: “application/json” }`
Body Parameters	object	Parameters included in the request body	`{ “model”: “sds/bge-reranker-v2-m3”, “text_1”: […], “text_2”: […] }`

Table. Score API - Context

Path Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Score API - Path Parameters

Query Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Score API - Query Parameters

Body Parameters

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	Specify the model to use for response generation			`“sds/bge-reranker-v2-m3”`
encoding_format	-	string	❌	Score return format	“float”	“float”(default) “int”	`“float”`
text_1	-	string, array	✅	First text to compare		String ("") Model’s maximum input length limit	`“What is the capital of France?"`
text_2	-	string, array	✅	Second text to compare		String (”") Model’s maximum input length limit	`[“The capital of France is Paris.”, ]`
truncate_prompt_tokens	-	integer	❌	Limit input token count		> 0	`100`

Table. Score API - Body Parameters

Example

curl -X 'POST' \
  'https://aios.private.kr-west1.e.samsungsdscloud.com/score' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "sds/bge-reranker-v2-m3",
  "encoding_format": "float",
"text_1": [
  "What is the largest planet in the solar system?",
  "What is the chemical symbol for water?"
],
"text_2": [
  "Jupiter is the largest planet in the solar system.",
  "The chemical symbol for water is H₂O."
]
}'

Response

200 OK

Name	Type	Description
id	string	Unique identifier for the response
object	string	Type of response object (e.g., “list” )
created	integer	Creation time (Unix timestamp, seconds)
model	string	Name of the model used
data	array	List of score calculation results
data.index	integer	Index of the item in the data array
data.object	string	Type of data item (e.g., “score”)
data.score	number	Calculated score value, normalized to 0 ~ 1
usage	object	Token usage statistics
usage.prompt_tokens	integer	Number of tokens used in the input prompt
usage.total_tokens	integer	Total number of tokens (input + output)
usage.completion_tokens	integer	Number of tokens used in the generated response
usage.prompt_tokens_details	null	Detailed information about prompt tokens

Table. Score API - 200 OK

Error Code

HTTP status code	Error Code Description
400	Bad Request
422	Validation Error
500	Internal Server Error

Table. Score API - Error Code

Example

{
  "id": "score-scp-aios-score",
  "object": "list",
  "created": 1748574112,
  "model": "sds/bge-reranker-v2-m3",
  "data": [
    {

Here is the translated text:

  "index": 0,
  "object": "score",
  "score": 1.0
},
{
  "index": 1,
  "object": "score",
  "score": 1.0
}

], “usage”: { “prompt_tokens”: 53, “total_tokens”: 53, “completion_tokens”: 0, “prompt_tokens_details”: null } }


## Reference
* [Score API vLLM documentation](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#score-api_1)




# Chat Completions API

```python
POST /v1/chat/completions

Overview

Chat Completions API is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.

Request

Context

Key	Type	Description	Example
Content-Type	string		application/json

Table. Chat Completions API - Context

Path Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Chat Completions API - Path Parameters

Query Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Chat Completions API - Query Parameters

Body Parameters

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	Specifies the model to use for generating responses			`“meta-llama/Llama-3.3-70B-Instruct”`
messages	role	string	✅	List of messages containing conversation history			`[ { “role” : “user” , “content” : “message” }]`
frequency_penalty	-	number	❌	Adjusts the penalty for repeating tokens	0	-2.0 ~ 2.0	`0.5`
logit_bias	-	object	❌	Adjusts the probability of specific tokens (e.g., { “100”: 2.0 })	null	Key: token ID, Value: -100 ~ 100	`{ “100”: 2.0 }`
logprobs	-	boolean	❌	Returns the probabilities of the top logprobs number of tokens	false	true, false	`true`
max_completion_tokens	-	integer	❌	Limits the maximum number of generated tokens	None	0 ~ model maximum	`100`
max_tokens (Deprecated)	-	integer	❌	Limits the maximum number of generated tokens	None	0 ~ model maximum	`100`
n	-	integer	❌	Specifies the number of responses to generate	1		`3`
presence_penalty	-	number	❌	Adjusts the penalty for tokens already present in the text	0	-2.0 ~ 2.0	`1.0`
seed	-	integer	❌	Specifies the seed value for controlling randomness	None
stop	-	string / array / null	❌	Stops generating when a specific string is encountered	null		`"\n"`
stream	-	boolean	❌	Returns the result in streaming mode	false	true/false	`true`
stream_options	include_usage, continuous_usage_stats	object	❌	Controls streaming options (e.g., including usage statistics)	null		`{ “include_usage”: true }`
temperature	-	number	❌	Adjusts the creativity of the generated response (higher means more random)	1	0.0 ~ 1.0	`0.7`
tool_choice	-	string	❌	Specifies which tool to call none: Does not call any tool auto: Model decides whether to call a tool or generate a message required: Model calls at least one tool	No tool: none With tool: auto
tools	-	array	❌	List of tools that the model can call Only functions are supported as tools Supports up to 128 functions	None
top_logprobs	-	integer	❌	Specifies the number of top logprobs tokens to return (between 0 and 20) Each is associated with a log probability value logprobs must be set to true Shows the probability values for the top k completions	None	0 ~ 20	`3`
top_p	-	number	❌	Limits the sampling probability of tokens (higher means more tokens are considered)	1	0.0 ~ 1.0	`0.9`

Table. Chat Completions API - Body Parameters

Example

curl -X 'POST' \
   'https://aios.private.kr-west1.e.samsungsdscloud.com/v1/chat/completions' \
  -H 'accept: application/json' \

200 OK

Name	Type	Description
id	string	Response’s unique identifier
object	string	Type of response object (e.g., “chat.completion”)
created	integer	Creation time (Unix timestamp, in seconds)
model	string	Name of the model used
choices	array	List of generated response choices
choices[].index	integer	Index of the choice
choices[].message	object	Generated message object
choices[].message.role	string	Role of the message author (e.g., “assistant”)
choices[].message.content	string	Actual content of the generated message
choices[].message.reasoning_content	string	Actual content of the generated reasoning message
choices[].message.tool_calls	array (optional)	Tool call information (may be included depending on the model/settings)
choices[].finish_reason	string or null	Reason why the response was terminated (e.g., “stop”, “length”, etc.)
choices[].stop_reason	object or null	Additional termination reason details
choices[].logprobs	object or null	Token-wise log probability information (may be included depending on the settings)
usage	object	Token usage statistics
usage.prompt_tokens	integer	Number of tokens used in the input prompt
usage.completion_tokens	integer	Number of tokens used in the generated response
usage.total_tokens	integer	Total number of tokens (input + output)

Table. Chat Completions API - 200 OK

Error Code

HTTP status code	Error Code Description
400	Bad Request
422	Validation Error
500	Internal Server Error

Table. Chat Completions API - Error Code

Example

{
  "id": "chatcmpl-scp-aios-chat-completions",
  "object": "chat.completion",
  "created": 1749702816,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "The capital of Korea is Seoul.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 54,
    "total_tokens": 62,
    "completion_tokens": 8,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

Reference

Chat Completions API vLLM documentation
Chat Completions API OpenAI documentation POST /v1/completions


## Overview

Completions API is compatible with OpenAI's Completions API and can be used with the OpenAI Python client.


## Request 

### Context







  


Key
Type
Description
Example




Base URL
string
API request URL for AIOS
application/json


Request Method
string
HTTP method used for the API request
POST


Headers
object
Header information required for the request
{ “accept”: “application/json”, “Content-Type”: “application/json” }


Body Parameters
object
Parameters included in the request body
’{“model”: “meta-llama/Llama-3.3-70B-Instruct”, “prompt” : “hello”, “stream”: “true”}’




  Table. Completions API - Context





### Path Parameters







  


Name
type
Required
Description
Default value
Boundary value
Example




None










  Table. Completions API - Path Parameters





### Query Parameters







  


Name
type
Required
Description
Default value
Boundary value
Example




None










  Table. Completions API - Query Parameters





### Body Parameters







  


Name
Name Sub
type
Required
Description
Default value
Boundary value
Example




model
-
string
✅
Model used to generate the response


“meta-llama/Llama-3.3-70B-Instruct”


prompt
-
array, string
✅
User input text


""


echo
-
boolean
❌
Whether to include the input text in the output
false
true/false
true


frequency_penalty
-
number
❌
Adjust the penalty for repeating tokens
0
-2.0 ~ 2.0
0.5


logit_bias
-
object
❌
Adjust the probability of specific tokens (e.g., { “100”: 2.0 })
null
Key: token ID, Value: -100~100
{ “100”: 2.0 }


logprobs
-
integer
❌
Return the probabilities of the top logprobs tokens
null
1 ~ 5
5


max_completion_tokens
-
integer
❌
Limit the maximum number of generated tokens
None
0~model maximum value
100


max_tokens (Deprecated)
-
integer
❌
Limit the maximum number of generated tokens
None
0~model maximum value
100


n
-
integer
❌
Specify the number of responses to generate
1

3


presence_penalty
-
number
❌
Adjust the penalty for tokens already present in the text
0
-2.0 ~ 2.0
1.0


seed
-
integer
❌
Specify a seed value for randomness control
None




stop
-
string / array / null
❌
Stop generating when a specific string is encountered
null

"\n"


stream
-
boolean
❌
Whether to return the results in a streaming manner
false
true/false
true


stream_options
include_usage, continuous_usage_stats
object
❌
Control streaming options (e.g., include usage statistics)
null

{ “include_usage”: true }


temperature
-
number
❌
Control the creativity of the generated response (higher means more random)
1
0.0 ~ 1.0
0.7


top_p
-
number
❌
Limit the sampling probability of tokens (higher means more tokens considered)
1
0.0 ~ 1.0
0.9




  Table. Completions API - Body Parameters






### Example

```python
curl -X 'POST' \
   'https://aios.private.kr-west1.e.samsungsdscloud.com/v1/completions' \ 
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
    "prompt": "What is the capital of Korea?",
    "temperature": 0.7
  }'

Key	Type	Description	Example
Base URL	string	API request URL for AIOS	`application/json`
Request Method	string	HTTP method used for the API request	`POST`
Headers	object	Header information required for the request	`{ “accept”: “application/json”, “Content-Type”: “application/json” }`
Body Parameters	object	Parameters included in the request body	`’{“model”: “meta-llama/Llama-3.3-70B-Instruct”, “prompt” : “hello”, “stream”: “true”}’`

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	Model used to generate the response			`“meta-llama/Llama-3.3-70B-Instruct”`
prompt	-	array, string	✅	User input text			`""`
echo	-	boolean	❌	Whether to include the input text in the output	false	true/false	`true`
frequency_penalty	-	number	❌	Adjust the penalty for repeating tokens	0	-2.0 ~ 2.0	`0.5`
logit_bias	-	object	❌	Adjust the probability of specific tokens (e.g., { “100”: 2.0 })	null	Key: token ID, Value: -100~100	`{ “100”: 2.0 }`
logprobs	-	integer	❌	Return the probabilities of the top logprobs tokens	null	1 ~ 5	`5`
max_completion_tokens	-	integer	❌	Limit the maximum number of generated tokens	None	0~model maximum value	`100`
max_tokens (Deprecated)	-	integer	❌	Limit the maximum number of generated tokens	None	0~model maximum value	`100`
n	-	integer	❌	Specify the number of responses to generate	1		`3`
presence_penalty	-	number	❌	Adjust the penalty for tokens already present in the text	0	-2.0 ~ 2.0	`1.0`
seed	-	integer	❌	Specify a seed value for randomness control	None
stop	-	string / array / null	❌	Stop generating when a specific string is encountered	null		`"\n"`
stream	-	boolean	❌	Whether to return the results in a streaming manner	false	true/false	`true`
stream_options	include_usage, continuous_usage_stats	object	❌	Control streaming options (e.g., include usage statistics)	null		`{ “include_usage”: true }`
temperature	-	number	❌	Control the creativity of the generated response (higher means more random)	1	0.0 ~ 1.0	`0.7`
top_p	-	number	❌	Limit the sampling probability of tokens (higher means more tokens considered)	1	0.0 ~ 1.0	`0.9`

Response

200 OK

Name	Type	Description
id	string	Unique identifier of the response
object	string	Type of the response object (e.g., “text_completion”)
created	integer	Creation time (Unix timestamp, seconds)
model	string	Name of the model used
choices	array	List of generated response choices
choices[].index	number	Index of the choice
choices[].text	string	Generated text object
choices[].logprobs	object	Token-wise log probability information (included based on settings)
choices[].finish_reason	string or null	Reason why the response was terminated (e.g., “stop”, “length” etc.)
choices[].stop_reason	object or null	Additional termination reason details
choices[].prompt_logprobs	object or null	Log probability of input prompt tokens (may be null)
usage	object	Token usage statistics
usage.prompt_tokens	number	Number of tokens used in the input prompt
usage.total_tokens	number	Total number of tokens (input + output)

| usage.completion_tokens	| number		| Number of tokens used in the generated response |
| usage.prompt_tokens_details	| object		| Details of prompt token usage |
<div class="figure-caption">
  Table. Completions API - 200 OK
</div>

Error Code

HTTP status code	Error Code Description
400	Bad Request
422	Validation Error
500	Internal Server Error

Table. Completions API - Error Code

Example

{
  "id": "cmpl-scp-aios-completions",
  "object": "text_completion",
  "created": 1749702612,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "text": " \nOur capital city is Seoul. \n\nA. 1\nB. ",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "prompt_logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 25,
    "completion_tokens": 16,
    "prompt_tokens_details": null
  }
}

Reference

Embedding API

POST /v1/embeddings

Overview

The Embedding API converts text into high-dimensional vectors (embeddings) that can be used for various natural language processing (NLP) tasks, such as calculating text similarity, clustering, and search.

Request

Context

Key	Type	Description	Example
Base URL	string	URL for AIOS API requests	`application/json`
Request Method	string	HTTP method used for API requests	`POST`
Headers	object	Header information required for requests	`{ “accept”: “application/json”, “Content-Type”: “application/json” }`
Body Parameters	object	Parameters included in the request body	`{ “model”: “sds/bge-m3”, “input”: “What is the capital of France?”}`

Table. Embedding API - Context

Path Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Embedding API - Path Parameters

Query Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Embedding API - Query Parameters

Body Parameters

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	Specify the model to use for generating responses			`“sds/bge-reranker-v2-m3”`
input	-	array<string	✅	User’s search query or question			`“What is the capital of France?"`
encoding_format	-	string	❌	Specify the format to return the embedding	“float”	“float”, “base64”	`[0.01319122314453125,0.057220458984375, … (omitted)`
truncate_prompt_tokens	-	integer	❌	Limit the number of input tokens		> 0	`100`

Table. Embedding API - Body Parameters

Example

curl -X 'POST' \
   'https://aios.private.kr-west1.e.samsungsdscloud.com/v1/embedding' \ 
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "sds/bge-m3",
    "input": "What is the capital of France?",
	"encoding_format": "float"
  }'

Response

200 OK

Name	Type	Description
id	string	Unique identifier of the response
object	string	Type of the response object (e.g., “list”)
created	number	Creation time (Unix timestamp, seconds)
model	string	Name of the model used
data	array	Array of objects containing embedding results
data.index	number	Index of the input text (e.g., order of input texts)
data.object	string	Type of data item
data.embedding	array	Embedding vector values of the input text (sds-bge-m3 is a 1024-dimensional float array)
usage	object	Token usage statistics
usage.prompt_tokens	number	Number of tokens used in the input prompt
usage.total_tokens	number	Total number of tokens (input + output)
usage.completion_tokens	number	Number of tokens used in the generated response
usage.prompt_tokens_details	object	Detailed information about prompt tokens

Table. Embedding API - 200 OK

Error Code

HTTP status code	Error Code Description
400	Bad Request
422	Validation Error
500	Internal Server Error

Table. Embedding API - Error Code

Example

{
  "id":"embd-scp-aios-embeddings",
  "object":"list","created":1749035024,
  "model":"sds/bge-m3",
  "data":[
    {
      "index":0,
      "object":"embedding",
      "embedding":
      [0.01319122314453125,0.057220458984375,-0.028533935546875,-0.0008697509765625,-0.01422119140625,0.033416748046875,-0.0062408447265625,-0.04364013671875,-0.004497528076171875,0.0008072853088378906,-0.0193328857421875,0.041168212890625,-0.019317626953125,-0.0188751220703125,-0.047088623046875,
      -0 ....(omitted)

      -0.05706787109375,-0.0147705078125]
    }
  ],
  "usage":
  {
    "prompt_tokens":9,
    "total_tokens":9,
    "completion_tokens":0,
    "prompt_tokens_details":null
  }
}