API Reference
API Reference Overview
The API references supported by AIOS are as follows.
| API name | API | Detailed description |
|---|
| Rerank API | POST /rerank, /v1/rerank, /v2/rerank | We apply an embedding model or a cross‑encoder model to predict the relevance between a single query and each item in a document list. |
| Score API | POST /score, /v1/score | Predict the similarity of two sentences. |
| Chat Completions API | POST /v1/chat/completions | It is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client. |
| Completions API | POST /v1/completions | It is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client. |
| Embedding API | POST /v1/embeddings | You can convert text into high-dimensional vectors (embeddings) and use them for various natural language processing (NLP) tasks such as similarity calculation between texts, clustering, and search. |
Table. AIOS supported API list
Rerank API
POST /rerank, /v1/rerank, /v2/rerank
Overview
The Rerank API predicts the relevance between a single query and each item in a document list by applying an embedding model or a cross-encoder model.
Generally, the score of a sentence pair represents the similarity between the two sentences on a scale from 0 to 1.
- Embedding-based model: After converting the query and documents each into vectors, we measure the similarity between vectors (e.g., cosine similarity) and compute a score.
- Reranker(Cross-Encoder) based model: Evaluates by feeding a query and document pair into the model.
Request
Context
| Key | Type | Description | Example |
|---|
| Base URL | string | AIOS URL for API requests | AIOS LLM Private Endpoint |
| Request Method | string | HTTP methods used in API requests | POST |
| Headers | object | Header information required for the request | { “Content-Type”: “application/json” } |
| Body Parameters | object | Parameters included in the request body | { “model”: “sds/bge-m3”, “query”: …, “documents”: […] } |
Table. Re-rank API - Context
Path Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Re-rank API - Path Parameters
Query Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Re-rank API - Query Parameters
Body Parameters
| Name | Name Sub | type | Required | Description | Default value | Boundary value | Example |
|---|
| model | - | string | ✅ | Specify the model to use for response generation | | | “sds/bge-reranker-v2-m3” |
| query | - | string | ✅ | User’s search query or question | | | “What is the capital of France?" |
| documents | - | array | ✅ | List of documents to be reordered | | Maximum model input length limit | [“The capital of France is Paris.”] |
| top_n | - | integer | ❌ | Specify the number of parent documents to return (0 returns all) | 0 | > 0 | 5 |
| truncate_prompt_tokens | - | integer | ❌ | Limit the number of input tokens | | > 0 | 100 |
Table. Re-rank API - Body Parameters
Example
curl -X "POST" \
{AIOS LLM private endpoint}/rerank \
-H "Content-Type: application/json" \
-d '{
"model": "sds/bge-reranker-v2-m3",
"query": "What is the capital of France?",
"documents": [
"The capital of France is Paris.",
"France capital city is known for the Eiffel Tower.",
"Paris is located in the north-central part of France."
],
"top_n": 2,
"truncate_prompt_tokens": 512
}'
curl -X "POST" \
{AIOS LLM private endpoint}/rerank \
-H "Content-Type: application/json" \
-d '{
"model": "sds/bge-reranker-v2-m3",
"query": "What is the capital of France?",
"documents": [
"The capital of France is Paris.",
"France capital city is known for the Eiffel Tower.",
"Paris is located in the north-central part of France."
],
"top_n": 2,
"truncate_prompt_tokens": 512
}'
Code block. Re-Rank API Request ExampleResponse
200 OK
| Name | Type | Description |
|---|
| id | string | Unique identifier of the API response (UUID format) |
| model | string | Name of the model that generated the result |
| usage | integer | Object containing resource information used in the request |
| usage.total_tokens | integer | Total number of tokens used for request processing |
| result | string | An array containing the results of documents related to the query |
| results[].index | integer | The index number within the result array |
| results[].document | object | An object containing the contents of the retrieved document |
| results[].document.text | string | The actual text content of the retrieved document |
| results[].relevance_score | float | Score indicating the relevance between the query and the document (0 ~ 1) |
Table. Re-rank API - 200 OK
Error Code
| HTTP status code | ErrorCode description |
|---|
| 400 | Bad Request |
| 422 | Validation Error |
| 500 | Internal Server Error |
Table. Re-rank API - Error Code
Example
{
"id": "rerank-scp-aios-rerank",
"model": "sds/sds/bge-m3",
"usage": {
"total_tokens": 65
},
"results": [
{
"index": 0,
"document": {
"text": "The capital of France is Paris."
},
"relevance_score": 0.8291233777999878
},
{
"index": 1,
"document": {
"text": "France capital city is known for the Eiffel Tower."
},
"relevance_score": 0.6996355652809143
}
]
}
{
"id": "rerank-scp-aios-rerank",
"model": "sds/sds/bge-m3",
"usage": {
"total_tokens": 65
},
"results": [
{
"index": 0,
"document": {
"text": "The capital of France is Paris."
},
"relevance_score": 0.8291233777999878
},
{
"index": 1,
"document": {
"text": "France capital city is known for the Eiffel Tower."
},
"relevance_score": 0.6996355652809143
}
]
}
Code block. Re-Rank API Response ExampleReference
Score API
Overview
The Score API predicts the similarity between two sentences. This API calculates the score using one of two models.
- Reranker(Cross-Encoder) model: It takes a pair of sentences as input and directly predicts similarity scores.
- Embedding model: After generating embedding vectors for each sentence, compute the cosine similarity (Cosine similarity) to derive a score.
Request
Context
| Key | Type | Description | Example |
|---|
| Base URL | string | AIOS URL for API requests | AIOS LLM Private Endpoint |
| Request Method | string | HTTP methods used in API requests | POST |
| Headers | object | Header information required for the request | { “Content-Type”: “application/json” } |
| Body Parameters | object | Parameters included in the request body | { “model”: “sds/bge-reranker-v2-m3”, “text_1”: […], “text_2”: […] } |
Table. Score API - Context
Path Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Score API - Path Parameters
Query Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Score API - Query Parameters
Body Parameters
| Name | Name Sub | type | Required | Description | Default value | Boundary value | Example |
|---|
| model | - | string | ✅ | Specify the model to use for response generation | | | “sds/bge-reranker-v2-m3” |
| encoding_format | - | string | ❌ | Score return format | float | | “float” |
| text_1 | - | string, array | ✅ | First text to compare | | - maximum input length limit of the model
| “What is the capital of France?" |
| text_2 | - | string, array | ✅ | Second text to compare | | - maximum input length limit of the model
| [“The capital of France is Paris.”, ] |
| truncate_prompt_tokens | - | integer | ❌ | Limit the number of input tokens | | > 0 | 100 |
Table. Score API - Body Parameters
Example
curl -X "POST" \
{AIOS LLM private endpoint}/score
-H "Content-Type: application/json" \
-d '{
"model": "sds/bge-reranker-v2-m3",
"encoding_format": "float",
"text_1": [
What is the largest planet in the solar system?
What is the chemical symbol for water?
],
"text_2": [
Jupiter is the largest planet in the solar system.
The chemical formula of water is H₂O.
]
}'
curl -X "POST" \
{AIOS LLM private endpoint}/score
-H "Content-Type: application/json" \
-d '{
"model": "sds/bge-reranker-v2-m3",
"encoding_format": "float",
"text_1": [
What is the largest planet in the solar system?
What is the chemical symbol for water?
],
"text_2": [
Jupiter is the largest planet in the solar system.
The chemical formula of water is H₂O.
]
}'
Code block. Score API Request ExampleResponse
200 OK
| Name | Type | Description |
|---|
| id | string | Unique identifier of the response |
| object | string | Response object’s type (example: “list” ) |
| created | integer | Creation time (Unix timestamp, in seconds) |
| model | string | Name of the model used |
| data | array | Score Calculation Result List |
| data.index | integer | Index of the item in the data array |
| data.object | string | Data item type (example: “score”) |
| data.score | number | Calculated score value, normalized to a range of 0 to 1. |
| usage | object | Token usage statistics |
| usage.prompt_tokens | integer | Number of tokens used in the input prompt |
| usage.total_tokens | integer | Total token count (input + output) |
| usage.completion_tokens | integer | Number of tokens used in the generated response |
| usage.prompt_tokens_details | null | Prompt token details |
Table. Score API - 200 OK
Error Code
| HTTP status code | ErrorCode description |
|---|
| 400 | Bad Request |
| 422 | Validation Error |
| 500 | Internal Server Error |
Table. Score API - Error Code
Example
{
"id": "score-scp-aios-score",
"object": "list",
"created": 1748574112,
"model": "sds/bge-reranker-v2-m3",
"data": [
{
"index": 0,
"object": "score",
"score": 1.0
},
{
"index": 1,
"object": "score",
"score": 1.0
}
],
"usage": {
"prompt_tokens": 53,
"total_tokens": 53,
"completion_tokens": 0,
"prompt_tokens_details": null
}
}
{
"id": "score-scp-aios-score",
"object": "list",
"created": 1748574112,
"model": "sds/bge-reranker-v2-m3",
"data": [
{
"index": 0,
"object": "score",
"score": 1.0
},
{
"index": 1,
"object": "score",
"score": 1.0
}
],
"usage": {
"prompt_tokens": 53,
"total_tokens": 53,
"completion_tokens": 0,
"prompt_tokens_details": null
}
}
code block. Score API Response ExampleReference
Chat Completions API
POST /v1/chat/completions
Overview
The Chat Completions API is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.
Request
Context
| Key | Type | Description | Example |
|---|
| Base URL | string | AIOS URL for API requests | AIOS LLM Private Endpoint |
| Request Method | string | HTTP methods used in API requests | POST |
| Headers | object | Header information required for the request | { “Content-Type”: “application/json” } |
| Body Parameters | object | Parameters included in the request body | {“model”: “meta-llama/Llama-3.3-70B-Instruct”, “messages” [{“role”: “user”, “content”: “hello”}], “stream”: true } |
Table. Chat Completions API - Context
Path Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Chat Completions API - Path Parameters
Query Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Chat Completions API - Query Parameters
Body Parameters
| Name | Name Sub | type | Required | Description | Default value | Boundary value | Example |
|---|
| model | - | string | ✅ | Specify the model to use for response generation | | | “meta-llama/Llama-3.3-70B-Instruct” |
| messages | role | string | ✅ | Message list containing conversation history | | | [ { “role” : “user” , “content” : “message” }] |
| frequency_penalty | - | number | ❌ | Adjust the penalty for repeated tokens | 0 | -2.0 ~ 2.0 | 0.5 |
| logit_bias | - | object | ❌ | Adjust the probability of a specific token (example: { “100”: 2.0 }) | null | Key: Token ID, Value: -100 ~ 100 | { “100”: 2.0 } |
| logprobs | - | boolean | ❌ | Returns token probabilities for the top logprobs count | false | true, false | true |
| max_completion_tokens | - | integer | ❌ | Limit the maximum number of generated tokens | None | 0 ~ model maximum value | 100 |
| max_tokens (Deprecated) | - | integer | ❌ | Limit the maximum number of generated tokens | None | 0 ~ model maximum value | 100 |
| n | - | integer | ❌ | Specify the number of responses to generate | 1 | | 3 |
| presence_penalty | - | number | ❌ | Adjust the penalty for tokens contained in the existing text. | 0 | -2.0 ~ 2.0 | 1.0 |
| seed | - | integer | ❌ | Specify the seed value for controlling randomness | None | | |
| stop | - | string / array / null | ❌ | Stop generation when a specific string appears. | null | | "\n" |
| stream | - | boolean | ❌ | Whether to return results in streaming mode | false | true/false | true |
| stream_options | include_usage, continuous_usage_stats | object | ❌ | Control streaming options (e.g., whether to include usage statistics) | null | | { “include_usage”: true } |
| temperature | - | number | ❌ | Adjust the creativity of the generated output (higher values are more random) | 1 | 0.0 ~ 1.0 | 0.7 |
| tool_choice | - | string | ❌ | Adjust which Tool is invoked by the model- none: Do not invoke any Tool
- auto: Let the model choose whether to generate a message or invoke a Tool
- required: The model must invoke one or more Tools
| - when there is no tool: none
- when there is a tool: auto
| | |
| tools | - | array | ❌ | List of tools the model can invoke- Only functions are supported as tools
- Supports up to 128 functions
| None | | |
| top_logprobs | - | integer | ❌ | Specify the number of most probable tokens as an integer between 0 and 20- Each is associated with a log probability value
- logprobs must be set to true
- Shows the probability values for the top k of completions
| None | 0 ~ 20 | 3 |
| top_p | - | number | ❌ | Limit the sampling probability of tokens (higher values consider more tokens) | 1 | 0.0 ~ 1.0 | 0.9 |
Table. Chat Completions API - Body Parameters
Example
curl -X "POST" \
{AIOS LLM private endpoint}/v1/chat/completions
-H "Content-Type: application/json" \
-d '{
"model": "/mnt/models/Meta-Llama-3.3-70B-Instruct",
"messages": [
{
"role": "assistant",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of Korea?"
}
]
}'
curl -X "POST" \
{AIOS LLM private endpoint}/v1/chat/completions
-H "Content-Type: application/json" \
-d '{
"model": "/mnt/models/Meta-Llama-3.3-70B-Instruct",
"messages": [
{
"role": "assistant",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of Korea?"
}
]
}'
Code block. CompChat Completionsletions API Request ExampleResponse
200 OK
| Name | Type | Description |
|---|
| id | string | unique identifier of the response |
| object | string | Response object’s type (example: “chat.completion”) |
| created | integer | Creation time (Unix timestamp, in seconds) |
| model | string | Name of the model used |
| choices | array | List of generated response options |
| choices[].index | integer | The index of the corresponding choice |
| choices[].message | object | Generated message object |
| choices[].message.role | string | The role of the message author (e.g., “assistant”) |
| choices[].message.content | string | The actual content of the generated message |
| choices[].message.reasoning_content | string | The actual content of the generated inference message |
| choices[].message.tool_calls | array (optional) | Tool invocation information (may be included depending on model/settings) |
| choices[].finish_reason | string or null | Reason the response was terminated (e.g., “stop”, “length”, etc.) |
| choices[].stop_reason | object or null | Additional stop reason details |
| choices[].logprobs | object or null | Log probability information per token (included depending on settings) |
| usage | object | Token usage statistics |
| usage.prompt_tokens | integer | Number of tokens used in the input prompt |
| usage.completion_tokens | integer | Number of tokens used in the generated response |
| usage.total_tokens | integer | Total token count (input + output) |
Table. Chat Completions API - 200 OK
Error Code
| HTTP status code | ErrorCode description |
|---|
| 400 | Bad Request |
| 422 | Validation Error |
| 500 | Internal Server Error |
Table. Chat Completions API - Error Code
Example
{
"id": "chatcmpl-scp-aios-chat-completions",
"object": "chat.completion",
"created": 1749702816,
"model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"reasoning_content": null,
"content": "The capital of South Korea is Seoul."
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 54,
"total_tokens": 62,
"completion_tokens": 8,
"prompt_tokens_details": null
},
"prompt_logprobs": null
}
{
"id": "chatcmpl-scp-aios-chat-completions",
"object": "chat.completion",
"created": 1749702816,
"model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"reasoning_content": null,
"content": "The capital of South Korea is Seoul."
"tool_calls": []
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 54,
"total_tokens": 62,
"completion_tokens": 8,
"prompt_tokens_details": null
},
"prompt_logprobs": null
}
code block. Chat Completions API Response ExampleReference
Completions API
Overview
The Completions API is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.
Request
Context
| Key | Type | Description | Example |
|---|
| Base URL | string | AIOS URL for API requests | AIOS LLM Private Endpoint |
| Request Method | string | HTTP methods used in API requests | POST |
| Headers | object | Header information required for the request | { “Content-Type”: “application/json” } |
| Body Parameters | object | Parameters included in the request body | {“model”: “meta-llama/Llama-3.3-70B-Instruct”, “prompt” : “hello”, “stream”: true } |
Table. Completions API - Context
Path Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Completions API - Path Parameters
Query Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Completions API - Query Parameters
Body Parameters
| Name | Name Sub | type | Required | Description | Default value | Boundary value | Example |
|---|
| model | - | string | ✅ | Specify the model to use for generating responses | | | “meta-llama/Llama-3.3-70B-Instruct” |
| prompt | - | array, string | ✅ | User input text | | | "" |
| echo | - | boolean | ❌ | Whether to include the input text in the output | false | true/false | true |
| frequency_penalty | - | number | ❌ | Adjust the penalty for repeated tokens | 0 | -2.0 ~ 2.0 | 0.5 |
| logit_bias | - | object | ❌ | Adjust the probability of a specific token (example: { “100”: 2.0 }) | null | Key: Token ID, Value: -100~100 | { “100”: 2.0 } |
| logprobs | - | integer | ❌ | Returns token probabilities for the top logprobs count | null | 1 ~ 5 | 5 |
| max_completion_tokens | - | integer | ❌ | Limit the maximum number of generated tokens | None | 0~model maximum value | 100 |
| max_tokens (Deprecated) | - | integer | ❌ | Limit the maximum number of generated tokens | None | 0~model maximum value | 100 |
| n | - | integer | ❌ | Specify the number of responses to generate | 1 | | 3 |
| presence_penalty | - | number | ❌ | Adjust the penalty for tokens in the existing text. | 0 | -2.0 ~ 2.0 | 1.0 |
| seed | - | integer | ❌ | Specify a seed value for controlling randomness | None | | |
| stop | - | string / array / null | ❌ | Stop generation when a specific string appears. | null | | "\n" |
| stream | - | boolean | ❌ | Whether to return results in streaming mode | false | true/false | true |
| stream_options | include_usage, continuous_usage_stats | object | ❌ | Control streaming options (e.g., whether to include usage statistics) | null | | { “include_usage”: true } |
| temperature | - | number | ❌ | Adjust the creativity of the generation result (higher values are more random) | 1 | 0.0 ~ 1.0 | 0.7 |
| top_p | - | number | ❌ | Limit the sampling probability of tokens (higher values consider more tokens) | 1 | 0.0 ~ 1.0 | 0.9 |
Table. Completions API - Body Parameters
Example
curl -X "POST" \
{AIOS LLM Private Endpoint}/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
"prompt": "What is the capital of South Korea?"
"temperature": 0.7
}'
curl -X "POST" \
{AIOS LLM Private Endpoint}/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
"prompt": "What is the capital of South Korea?"
"temperature": 0.7
}'
code block. Completions API Request ExampleResponse
200 OK
| Name | Type | Description |
|---|
| id | string | Unique identifier of the response |
| object | string | Response object’s type (e.g., “text_completion”) |
| created | integer | Creation time (Unix timestamp, in seconds) |
| model | string | Name of the model used |
| choices | array | List of generated response options |
| choices[].index | number | The index of the corresponding choice |
| choices[].text | string | Generated text object |
| choices[].logprobs | object | Log probability information per token (included depending on settings) |
| choices[].finish_reason | string or null | Reason the response was terminated (e.g., “stop”, “length”, etc.) |
| choices[].stop_reason | object or null | Additional stop reason details |
| choices[].prompt_logprobs | object or null | Log probability per input prompt token (null allowed) |
| usage | object | Token usage statistics |
| usage.prompt_tokens | number | Number of tokens used in the input prompt |
| usage.total_tokens | number | Total token count (input + output) |
| usage.completion_tokens | number | Number of tokens used in the generated response |
| usage.prompt_tokens_details | object | Prompt token usage details |
Table. Completions API - 200 OK
Error Code
| HTTP status code | ErrorCode description |
|---|
| 400 | Bad Request |
| 422 | Validation Error |
| 500 | Internal Server Error |
Table. Completions API - Error Code
Example
{
"id": "cmpl-scp-aios-completions",
"object": "text_completion",
"created": 1749702612,
"model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
"choices": [
{
"index": 0,
"text": " \nOur capital city is Seoul. \n\nA. 1\nB. ",
"logprobs": null,
"finish_reason": "length",
"stop_reason": null,
"prompt_logprobs": null
}
],
"usage": {
"prompt_tokens": 9,
"total_tokens": 25,
"completion_tokens": 16,
"prompt_tokens_details": null
}
}
{
"id": "cmpl-scp-aios-completions",
"object": "text_completion",
"created": 1749702612,
"model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
"choices": [
{
"index": 0,
"text": " \nOur capital city is Seoul. \n\nA. 1\nB. ",
"logprobs": null,
"finish_reason": "length",
"stop_reason": null,
"prompt_logprobs": null
}
],
"usage": {
"prompt_tokens": 9,
"total_tokens": 25,
"completion_tokens": 16,
"prompt_tokens_details": null
}
}
code block. Completions API Response ExampleReference
Embedding API
Overview
The Embedding API converts text into high‑dimensional vectors (embeddings), which can be used for various natural language processing (NLP) tasks such as similarity calculation between texts, clustering, and search.
Request
Context
| Key | Type | Description | Example |
|---|
| Base URL | string | AIOS URL for API requests | application/json |
| Request Method | string | HTTP methods used in API requests | POST |
| Headers | object | Header information required for the request | { “accept”: “application/json”, “Content-Type”: “application/json” } |
| Body Parameters | object | Parameters included in the request body | { “model”: “sds/bge-m3”, “input”: “What is the capital of France?”} |
Table. Embedding API - Context
Path Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Embedding API - Path Parameters
Query Parameters
| Name | type | Required | Description | Default value | Boundary value | Example |
|---|
| None | | | | | | |
Table. Embedding API - Query Parameters
Body Parameters
| Name | Name Sub | type | Required | Description | Default value | Boundary value | Example |
|---|
| model | - | string | ✅ | Specify the model to use for generating responses | | | “sds/bge-reranker-v2-m3” |
| input | - | array<string | ✅ | User’s search query or question | | | “What is the capital of France?" |
| encoding_format | - | string | ❌ | Specify the format to return the embedding | float | “float”, “base64” | [0.01319122314453125,0.057220458984375, … (omitted) |
| truncate_prompt_tokens | - | integer | ❌ | Limit the number of input tokens | | > 0 | 100 |
Table. Embedding API - Body Parameters
Example
curl -X "POST" \
{AIOS LLM Private Endpoint}/v1/embedding \
-H "Content-Type: application/json" \
-d '{
"model": "sds/bge-m3",
"input": "What is the capital of France?",
"encoding_format": "float"
}'
curl -X "POST" \
{AIOS LLM Private Endpoint}/v1/embedding \
-H "Content-Type: application/json" \
-d '{
"model": "sds/bge-m3",
"input": "What is the capital of France?",
"encoding_format": "float"
}'
Code block. Embedding API Request ExampleResponse
200 OK
| Name | Type | Description |
|---|
| id | string | Unique identifier of the response |
| object | string | Response object’s type (example: “list” ) |
| created | number | Creation time (Unix timestamp, in seconds) |
| model | string | Name of the model used |
| data | array | Array of objects containing embedding results |
| data.index | number | Order index of the input text (example: indicates the order when multiple input texts are provided) |
| data.object | string | Data item type |
| data.embedding | array | Embedding vector values of the input text (sds-bge-m3 consists of a 1024-dimensional float array) |
| usage | object | Token usage statistics |
| usage.prompt_tokens | number | Number of tokens used in the input prompt |
| usage.total_tokens | number | Total token count (input + output) |
| usage.completion_tokens | number | Number of tokens used in the generated response |
| usage.prompt_tokens_details | object | Prompt token details |
Table. Embedding API - 200 OK
Error Code
| HTTP status code | ErrorCode description |
|---|
| 400 | Bad Request |
| 422 | Validation Error |
| 500 | Internal Server Error |
Table. Embedding API - Error Code
Example
{
"id":"embd-scp-aios-embeddings",
"object":"list","created":1749035024,
"model":"sds/bge-m3",
"data":[
{
"index":0,
"object":"embedding",
"embedding":
[0.01319122314453125,0.057220458984375,-0.028533935546875,-0.0008697509765625,-0.01422119140625,0.033416748046875,-0.0062408447265625,-0.04364013671875,-0.004497528076171875,0.0008072853088378906,-0.0193328857421875,0.041168212890625,-0.019317626953125,-0.0188751220703125,-0.047088623046875,
-0 ....(omitted)
-0.05706787109375,-0.0147705078125]
}
],
"usage":
{
"prompt_tokens":9,
"total_tokens":9,
"completion_tokens":0,
"prompt_tokens_details":null
}
}
{
"id":"embd-scp-aios-embeddings",
"object":"list","created":1749035024,
"model":"sds/bge-m3",
"data":[
{
"index":0,
"object":"embedding",
"embedding":
[0.01319122314453125,0.057220458984375,-0.028533935546875,-0.0008697509765625,-0.01422119140625,0.033416748046875,-0.0062408447265625,-0.04364013671875,-0.004497528076171875,0.0008072853088378906,-0.0193328857421875,0.041168212890625,-0.019317626953125,-0.0188751220703125,-0.047088623046875,
-0 ....(omitted)
-0.05706787109375,-0.0147705078125]
}
],
"usage":
{
"prompt_tokens":9,
"total_tokens":9,
"completion_tokens":0,
"prompt_tokens_details":null
}
}
Code block. Embedding API Response ExampleReference