The page has been translated by Gen AI.

API Reference

API Reference Overview

The API references supported by AIOS are as follows.

API name	API	Detailed description
Rerank API	POST /rerank, /v1/rerank, /v2/rerank	We apply an embedding model or a cross‑encoder model to predict the relevance between a single query and each item in a document list.
Score API	POST /score, /v1/score	Predict the similarity of two sentences.
Chat Completions API	POST /v1/chat/completions	It is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.
Completions API	POST /v1/completions	It is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.
Embedding API	POST /v1/embeddings	You can convert text into high-dimensional vectors (embeddings) and use them for various natural language processing (NLP) tasks such as similarity calculation between texts, clustering, and search.

Table. AIOS supported API list

Rerank API

POST /rerank, /v1/rerank, /v2/rerank

Overview

The Rerank API predicts the relevance between a single query and each item in a document list by applying an embedding model or a cross-encoder model. Generally, the score of a sentence pair represents the similarity between the two sentences on a scale from 0 to 1.

Embedding-based model: After converting the query and documents each into vectors, we measure the similarity between vectors (e.g., cosine similarity) and compute a score.
Reranker(Cross-Encoder) based model: Evaluates by feeding a query and document pair into the model.

Request

Context

Key	Type	Description	Example
Base URL	string	AIOS URL for API requests	`AIOS LLM Private Endpoint`
Request Method	string	HTTP methods used in API requests	`POST`
Headers	object	Header information required for the request	`{ “Content-Type”: “application/json” }`
Body Parameters	object	Parameters included in the request body	`{ “model”: “sds/bge-m3”, “query”: …, “documents”: […] }`

Table. Re-rank API - Context

Path Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Re-rank API - Path Parameters

Query Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Re-rank API - Query Parameters

Body Parameters

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	Specify the model to use for response generation			`“sds/bge-reranker-v2-m3”`
query	-	string	✅	User’s search query or question			`“What is the capital of France?"`
documents	-	array	✅	List of documents to be reordered		Maximum model input length limit	`[“The capital of France is Paris.”]`
top_n	-	integer	❌	Specify the number of parent documents to return (0 returns all)	0	> 0	`5`
truncate_prompt_tokens	-	integer	❌	Limit the number of input tokens		> 0	`100`

Table. Re-rank API - Body Parameters

Example

Color mode

curl -X "POST" \
   {AIOS LLM private endpoint}/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sds/bge-reranker-v2-m3",
    "query": "What is the capital of France?",
    "documents": [
      "The capital of France is Paris.",
      "France capital city is known for the Eiffel Tower.",
      "Paris is located in the north-central part of France."
    ],
    "top_n": 2, 
    "truncate_prompt_tokens": 512
  }'

curl -X "POST" \
   {AIOS LLM private endpoint}/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sds/bge-reranker-v2-m3",
    "query": "What is the capital of France?",
    "documents": [
      "The capital of France is Paris.",
      "France capital city is known for the Eiffel Tower.",
      "Paris is located in the north-central part of France."
    ],
    "top_n": 2, 
    "truncate_prompt_tokens": 512
  }'

Code block. Re-Rank API Request Example

Response

200 OK

Name	Type	Description
id	string	Unique identifier of the API response (UUID format)
model	string	Name of the model that generated the result
usage	integer	Object containing resource information used in the request
usage.total_tokens	integer	Total number of tokens used for request processing
result	string	An array containing the results of documents related to the query
results[].index	integer	The index number within the result array
results[].document	object	An object containing the contents of the retrieved document
results[].document.text	string	The actual text content of the retrieved document
results[].relevance_score	float	Score indicating the relevance between the query and the document (0 ~ 1)

Table. Re-rank API - 200 OK

Error Code

HTTP status code	ErrorCode description
400	Bad Request
422	Validation Error
500	Internal Server Error

Table. Re-rank API - Error Code

Example

Color mode

{
  "id": "rerank-scp-aios-rerank",
  "model": "sds/sds/bge-m3",
  "usage": {
    "total_tokens": 65
  },
  "results": [
    {
      "index": 0,
      "document": {
        "text": "The capital of France is Paris."
      },
      "relevance_score": 0.8291233777999878
    },
    {
      "index": 1,
      "document": {
        "text": "France capital city is known for the Eiffel Tower."
      },
      "relevance_score": 0.6996355652809143
    }
  ]
}

{
  "id": "rerank-scp-aios-rerank",
  "model": "sds/sds/bge-m3",
  "usage": {
    "total_tokens": 65
  },
  "results": [
    {
      "index": 0,
      "document": {
        "text": "The capital of France is Paris."
      },
      "relevance_score": 0.8291233777999878
    },
    {
      "index": 1,
      "document": {
        "text": "France capital city is known for the Eiffel Tower."
      },
      "relevance_score": 0.6996355652809143
    }
  ]
}

Code block. Re-Rank API Response Example

Reference

Rerank API vLLM documentation

Score API

POST /score, /v1/score

Overview

The Score API predicts the similarity between two sentences. This API calculates the score using one of two models.

Reranker(Cross-Encoder) model: It takes a pair of sentences as input and directly predicts similarity scores.
Embedding model: After generating embedding vectors for each sentence, compute the cosine similarity (Cosine similarity) to derive a score.

Request

Context

Key	Type	Description	Example
Base URL	string	AIOS URL for API requests	`AIOS LLM Private Endpoint`
Request Method	string	HTTP methods used in API requests	`POST`
Headers	object	Header information required for the request	`{ “Content-Type”: “application/json” }`
Body Parameters	object	Parameters included in the request body	`{ “model”: “sds/bge-reranker-v2-m3”, “text_1”: […], “text_2”: […] }`

Table. Score API - Context

Path Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Score API - Path Parameters

Query Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Score API - Query Parameters

Body Parameters

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	Specify the model to use for response generation			`“sds/bge-reranker-v2-m3”`
encoding_format	-	string	❌	Score return format	float	“float”(default) “int”	`“float”`
text_1	-	string, array	✅	First text to compare		string ("") maximum input length limit of the model	`“What is the capital of France?"`
text_2	-	string, array	✅	Second text to compare		string (”") maximum input length limit of the model	`[“The capital of France is Paris.”, ]`
truncate_prompt_tokens	-	integer	❌	Limit the number of input tokens		> 0	`100`

Table. Score API - Body Parameters

Example

Color mode

curl -X "POST" \
  {AIOS LLM private endpoint}/score
  -H "Content-Type: application/json" \
  -d '{
  "model": "sds/bge-reranker-v2-m3",
  "encoding_format": "float",
"text_1": [
  What is the largest planet in the solar system?
  What is the chemical symbol for water?
],
"text_2": [
  Jupiter is the largest planet in the solar system.
  The chemical formula of water is H₂O.
]
}'

curl -X "POST" \
  {AIOS LLM private endpoint}/score
  -H "Content-Type: application/json" \
  -d '{
  "model": "sds/bge-reranker-v2-m3",
  "encoding_format": "float",
"text_1": [
  What is the largest planet in the solar system?
  What is the chemical symbol for water?
],
"text_2": [
  Jupiter is the largest planet in the solar system.
  The chemical formula of water is H₂O.
]
}'

Code block. Score API Request Example

Response

200 OK

Name	Type	Description
id	string	Unique identifier of the response
object	string	Response object’s type (example: “list” )
created	integer	Creation time (Unix timestamp, in seconds)
model	string	Name of the model used
data	array	Score Calculation Result List
data.index	integer	Index of the item in the data array
data.object	string	Data item type (example: “score”)
data.score	number	Calculated score value, normalized to a range of 0 to 1.
usage	object	Token usage statistics
usage.prompt_tokens	integer	Number of tokens used in the input prompt
usage.total_tokens	integer	Total token count (input + output)
usage.completion_tokens	integer	Number of tokens used in the generated response
usage.prompt_tokens_details	null	Prompt token details

Table. Score API - 200 OK

Error Code

HTTP status code	ErrorCode description
400	Bad Request
422	Validation Error
500	Internal Server Error

Table. Score API - Error Code

Example

Color mode

{
  "id": "score-scp-aios-score",
  "object": "list",
  "created": 1748574112,
  "model": "sds/bge-reranker-v2-m3",
  "data": [
    {
      "index": 0,
      "object": "score",
      "score": 1.0
    },
    {
      "index": 1,
      "object": "score",
      "score": 1.0
    }
  ],
  "usage": {
    "prompt_tokens": 53,
    "total_tokens": 53,
    "completion_tokens": 0,
    "prompt_tokens_details": null
  }
}

{
  "id": "score-scp-aios-score",
  "object": "list",
  "created": 1748574112,
  "model": "sds/bge-reranker-v2-m3",
  "data": [
    {
      "index": 0,
      "object": "score",
      "score": 1.0
    },
    {
      "index": 1,
      "object": "score",
      "score": 1.0
    }
  ],
  "usage": {
    "prompt_tokens": 53,
    "total_tokens": 53,
    "completion_tokens": 0,
    "prompt_tokens_details": null
  }
}

code block. Score API Response Example

Reference

Score API vLLM documentation

Chat Completions API

POST /v1/chat/completions

Overview

The Chat Completions API is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.

Request

Context

Key	Type	Description	Example
Base URL	string	AIOS URL for API requests	`AIOS LLM Private Endpoint`
Request Method	string	HTTP methods used in API requests	`POST`
Headers	object	Header information required for the request	`{ “Content-Type”: “application/json” }`
Body Parameters	object	Parameters included in the request body	`{“model”: “meta-llama/Llama-3.3-70B-Instruct”, “messages” [{“role”: “user”, “content”: “hello”}], “stream”: true }`

Table. Chat Completions API - Context

Path Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Chat Completions API - Path Parameters

Query Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Chat Completions API - Query Parameters

Body Parameters

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	Specify the model to use for response generation			`“meta-llama/Llama-3.3-70B-Instruct”`
messages	role	string	✅	Message list containing conversation history			`[ { “role” : “user” , “content” : “message” }]`
frequency_penalty	-	number	❌	Adjust the penalty for repeated tokens	0	-2.0 ~ 2.0	`0.5`
logit_bias	-	object	❌	Adjust the probability of a specific token (example: { “100”: 2.0 })	null	Key: Token ID, Value: -100 ~ 100	`{ “100”: 2.0 }`
logprobs	-	boolean	❌	Returns token probabilities for the top logprobs count	false	true, false	`true`
max_completion_tokens	-	integer	❌	Limit the maximum number of generated tokens	None	0 ~ model maximum value	`100`
max_tokens (Deprecated)	-	integer	❌	Limit the maximum number of generated tokens	None	0 ~ model maximum value	`100`
n	-	integer	❌	Specify the number of responses to generate	1		`3`
presence_penalty	-	number	❌	Adjust the penalty for tokens contained in the existing text.	0	-2.0 ~ 2.0	`1.0`
seed	-	integer	❌	Specify the seed value for controlling randomness	None
stop	-	string / array / null	❌	Stop generation when a specific string appears.	null		`"\n"`
stream	-	boolean	❌	Whether to return results in streaming mode	false	true/false	`true`
stream_options	include_usage, continuous_usage_stats	object	❌	Control streaming options (e.g., whether to include usage statistics)	null		`{ “include_usage”: true }`
temperature	-	number	❌	Adjust the creativity of the generated output (higher values are more random)	1	0.0 ~ 1.0	`0.7`
tool_choice	-	string	❌	Adjust which Tool is invoked by the model none: Do not invoke any Tool auto: Let the model choose whether to generate a message or invoke a Tool required: The model must invoke one or more Tools	when there is no tool: none when there is a tool: auto
tools	-	array	❌	List of tools the model can invoke Only functions are supported as tools Supports up to 128 functions	None
top_logprobs	-	integer	❌	Specify the number of most probable tokens as an integer between 0 and 20 Each is associated with a log probability value logprobs must be set to true Shows the probability values for the top k of completions	None	0 ~ 20	`3`
top_p	-	number	❌	Limit the sampling probability of tokens (higher values consider more tokens)	1	0.0 ~ 1.0	`0.9`

Table. Chat Completions API - Body Parameters

Example

Color mode

curl -X "POST" \
   {AIOS LLM private endpoint}/v1/chat/completions
  -H "Content-Type: application/json" \
  -d '{
    "model": "/mnt/models/Meta-Llama-3.3-70B-Instruct",
      "messages": [
      {
        "role": "assistant",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the capital of Korea?"
      }
    ]
}'

curl -X "POST" \
   {AIOS LLM private endpoint}/v1/chat/completions
  -H "Content-Type: application/json" \
  -d '{
    "model": "/mnt/models/Meta-Llama-3.3-70B-Instruct",
      "messages": [
      {
        "role": "assistant",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the capital of Korea?"
      }
    ]
}'

Code block. CompChat Completionsletions API Request Example

Response

200 OK

Name	Type	Description
id	string	unique identifier of the response
object	string	Response object’s type (example: “chat.completion”)
created	integer	Creation time (Unix timestamp, in seconds)
model	string	Name of the model used
choices	array	List of generated response options
choices[].index	integer	The index of the corresponding choice
choices[].message	object	Generated message object
choices[].message.role	string	The role of the message author (e.g., “assistant”)
choices[].message.content	string	The actual content of the generated message
choices[].message.reasoning_content	string	The actual content of the generated inference message
choices[].message.tool_calls	array (optional)	Tool invocation information (may be included depending on model/settings)
choices[].finish_reason	string or null	Reason the response was terminated (e.g., “stop”, “length”, etc.)
choices[].stop_reason	object or null	Additional stop reason details
choices[].logprobs	object or null	Log probability information per token (included depending on settings)
usage	object	Token usage statistics
usage.prompt_tokens	integer	Number of tokens used in the input prompt
usage.completion_tokens	integer	Number of tokens used in the generated response
usage.total_tokens	integer	Total token count (input + output)

Table. Chat Completions API - 200 OK

Error Code

HTTP status code	ErrorCode description
400	Bad Request
422	Validation Error
500	Internal Server Error

Table. Chat Completions API - Error Code

Example

Color mode

{
  "id": "chatcmpl-scp-aios-chat-completions",
  "object": "chat.completion",
  "created": 1749702816,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "The capital of South Korea is Seoul."
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 54,
    "total_tokens": 62,
    "completion_tokens": 8,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

{
  "id": "chatcmpl-scp-aios-chat-completions",
  "object": "chat.completion",
  "created": 1749702816,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "The capital of South Korea is Seoul."
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 54,
    "total_tokens": 62,
    "completion_tokens": 8,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

code block. Chat Completions API Response Example

Reference

Completions API

POST /v1/completions

Overview

The Completions API is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.

Request

Context

Key	Type	Description	Example
Base URL	string	AIOS URL for API requests	`AIOS LLM Private Endpoint`
Request Method	string	HTTP methods used in API requests	`POST`
Headers	object	Header information required for the request	`{ “Content-Type”: “application/json” }`
Body Parameters	object	Parameters included in the request body	`{“model”: “meta-llama/Llama-3.3-70B-Instruct”, “prompt” : “hello”, “stream”: true }`

Table. Completions API - Context

Path Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Completions API - Path Parameters

Query Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Completions API - Query Parameters

Body Parameters

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	Specify the model to use for generating responses			`“meta-llama/Llama-3.3-70B-Instruct”`
prompt	-	array, string	✅	User input text			`""`
echo	-	boolean	❌	Whether to include the input text in the output	false	true/false	`true`
frequency_penalty	-	number	❌	Adjust the penalty for repeated tokens	0	-2.0 ~ 2.0	`0.5`
logit_bias	-	object	❌	Adjust the probability of a specific token (example: { “100”: 2.0 })	null	Key: Token ID, Value: -100~100	`{ “100”: 2.0 }`
logprobs	-	integer	❌	Returns token probabilities for the top logprobs count	null	1 ~ 5	`5`
max_completion_tokens	-	integer	❌	Limit the maximum number of generated tokens	None	0~model maximum value	`100`
max_tokens (Deprecated)	-	integer	❌	Limit the maximum number of generated tokens	None	0~model maximum value	`100`
n	-	integer	❌	Specify the number of responses to generate	1		`3`
presence_penalty	-	number	❌	Adjust the penalty for tokens in the existing text.	0	-2.0 ~ 2.0	`1.0`
seed	-	integer	❌	Specify a seed value for controlling randomness	None
stop	-	string / array / null	❌	Stop generation when a specific string appears.	null		`"\n"`
stream	-	boolean	❌	Whether to return results in streaming mode	false	true/false	`true`
stream_options	include_usage, continuous_usage_stats	object	❌	Control streaming options (e.g., whether to include usage statistics)	null		`{ “include_usage”: true }`
temperature	-	number	❌	Adjust the creativity of the generation result (higher values are more random)	1	0.0 ~ 1.0	`0.7`
top_p	-	number	❌	Limit the sampling probability of tokens (higher values consider more tokens)	1	0.0 ~ 1.0	`0.9`

Table. Completions API - Body Parameters

Example

Color mode

curl -X "POST" \
   {AIOS LLM Private Endpoint}/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
    "prompt": "What is the capital of South Korea?"
    "temperature": 0.7
  }'

curl -X "POST" \
   {AIOS LLM Private Endpoint}/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
    "prompt": "What is the capital of South Korea?"
    "temperature": 0.7
  }'

code block. Completions API Request Example

Response

200 OK

Name	Type	Description
id	string	Unique identifier of the response
object	string	Response object’s type (e.g., “text_completion”)
created	integer	Creation time (Unix timestamp, in seconds)
model	string	Name of the model used
choices	array	List of generated response options
choices[].index	number	The index of the corresponding choice
choices[].text	string	Generated text object
choices[].logprobs	object	Log probability information per token (included depending on settings)
choices[].finish_reason	string or null	Reason the response was terminated (e.g., “stop”, “length”, etc.)
choices[].stop_reason	object or null	Additional stop reason details
choices[].prompt_logprobs	object or null	Log probability per input prompt token (null allowed)
usage	object	Token usage statistics
usage.prompt_tokens	number	Number of tokens used in the input prompt
usage.total_tokens	number	Total token count (input + output)
usage.completion_tokens	number	Number of tokens used in the generated response
usage.prompt_tokens_details	object	Prompt token usage details

Table. Completions API - 200 OK

Error Code

HTTP status code	ErrorCode description
400	Bad Request
422	Validation Error
500	Internal Server Error

Table. Completions API - Error Code

Example

Color mode

{
  "id": "cmpl-scp-aios-completions",
  "object": "text_completion",
  "created": 1749702612,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "text": " \nOur capital city is Seoul. \n\nA. 1\nB. ",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "prompt_logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 25,
    "completion_tokens": 16,
    "prompt_tokens_details": null
  }
}

{
  "id": "cmpl-scp-aios-completions",
  "object": "text_completion",
  "created": 1749702612,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "text": " \nOur capital city is Seoul. \n\nA. 1\nB. ",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "prompt_logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 25,
    "completion_tokens": 16,
    "prompt_tokens_details": null
  }
}

code block. Completions API Response Example

Reference

Embedding API

POST /v1/embeddings

Overview

The Embedding API converts text into high‑dimensional vectors (embeddings), which can be used for various natural language processing (NLP) tasks such as similarity calculation between texts, clustering, and search.

Request

Context

Key	Type	Description	Example
Base URL	string	AIOS URL for API requests	`application/json`
Request Method	string	HTTP methods used in API requests	`POST`
Headers	object	Header information required for the request	`{ “accept”: “application/json”, “Content-Type”: “application/json” }`
Body Parameters	object	Parameters included in the request body	`{ “model”: “sds/bge-m3”, “input”: “What is the capital of France?”}`

Table. Embedding API - Context

Path Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Embedding API - Path Parameters

Query Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

Table. Embedding API - Query Parameters

Body Parameters

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	Specify the model to use for generating responses			`“sds/bge-reranker-v2-m3”`
input	-	array<string	✅	User’s search query or question			`“What is the capital of France?"`
encoding_format	-	string	❌	Specify the format to return the embedding	float	“float”, “base64”	`[0.01319122314453125,0.057220458984375, … (omitted)`
truncate_prompt_tokens	-	integer	❌	Limit the number of input tokens		> 0	`100`

Table. Embedding API - Body Parameters

Example

Color mode

curl -X "POST" \
   {AIOS LLM Private Endpoint}/v1/embedding \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sds/bge-m3",
    "input": "What is the capital of France?",
	"encoding_format": "float"
  }'

curl -X "POST" \
   {AIOS LLM Private Endpoint}/v1/embedding \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sds/bge-m3",
    "input": "What is the capital of France?",
	"encoding_format": "float"
  }'

Code block. Embedding API Request Example

Response

200 OK

Name	Type	Description
id	string	Unique identifier of the response
object	string	Response object’s type (example: “list” )
created	number	Creation time (Unix timestamp, in seconds)
model	string	Name of the model used
data	array	Array of objects containing embedding results
data.index	number	Order index of the input text (example: indicates the order when multiple input texts are provided)
data.object	string	Data item type
data.embedding	array	Embedding vector values of the input text (sds-bge-m3 consists of a 1024-dimensional float array)
usage	object	Token usage statistics
usage.prompt_tokens	number	Number of tokens used in the input prompt
usage.total_tokens	number	Total token count (input + output)
usage.completion_tokens	number	Number of tokens used in the generated response
usage.prompt_tokens_details	object	Prompt token details

Table. Embedding API - 200 OK

Error Code

HTTP status code	ErrorCode description
400	Bad Request
422	Validation Error
500	Internal Server Error

Table. Embedding API - Error Code

Example

Color mode

{
  "id":"embd-scp-aios-embeddings",
  "object":"list","created":1749035024,
  "model":"sds/bge-m3",
  "data":[
    {
      "index":0,
      "object":"embedding",
      "embedding":
      [0.01319122314453125,0.057220458984375,-0.028533935546875,-0.0008697509765625,-0.01422119140625,0.033416748046875,-0.0062408447265625,-0.04364013671875,-0.004497528076171875,0.0008072853088378906,-0.0193328857421875,0.041168212890625,-0.019317626953125,-0.0188751220703125,-0.047088623046875,
      -0 ....(omitted)

      -0.05706787109375,-0.0147705078125]
    }
  ],
  "usage":
  {
    "prompt_tokens":9,
    "total_tokens":9,
    "completion_tokens":0,
    "prompt_tokens_details":null
  }
}

{
  "id":"embd-scp-aios-embeddings",
  "object":"list","created":1749035024,
  "model":"sds/bge-m3",
  "data":[
    {
      "index":0,
      "object":"embedding",
      "embedding":
      [0.01319122314453125,0.057220458984375,-0.028533935546875,-0.0008697509765625,-0.01422119140625,0.033416748046875,-0.0062408447265625,-0.04364013671875,-0.004497528076171875,0.0008072853088378906,-0.0193328857421875,0.041168212890625,-0.019317626953125,-0.0188751220703125,-0.047088623046875,
      -0 ....(omitted)

      -0.05706787109375,-0.0147705078125]
    }
  ],
  "usage":
  {
    "prompt_tokens":9,
    "total_tokens":9,
    "completion_tokens":0,
    "prompt_tokens_details":null
  }
}

Code block. Embedding API Response Example

Reference

References

Tutorial