The page has been translated by Gen AI.

API Reference

API Reference Overview

The API references supported by AIOS are as follows.

API nameAPIDetailed description
Rerank APIPOST /rerank, /v1/rerank, /v2/rerankWe apply an embedding model or a cross‑encoder model to predict the relevance between a single query and each item in a document list.
Score APIPOST /score, /v1/scorePredict the similarity of two sentences.
Chat Completions APIPOST /v1/chat/completionsIt is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.
Completions APIPOST /v1/completionsIt is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.
Embedding APIPOST /v1/embeddingsYou can convert text into high-dimensional vectors (embeddings) and use them for various natural language processing (NLP) tasks such as similarity calculation between texts, clustering, and search.
Table. AIOS supported API list

Rerank API

POST /rerank, /v1/rerank, /v2/rerank

Overview

The Rerank API predicts the relevance between a single query and each item in a document list by applying an embedding model or a cross-encoder model. Generally, the score of a sentence pair represents the similarity between the two sentences on a scale from 0 to 1.

  • Embedding-based model: After converting the query and documents each into vectors, we measure the similarity between vectors (e.g., cosine similarity) and compute a score.
  • Reranker(Cross-Encoder) based model: Evaluates by feeding a query and document pair into the model.

Request

Context

KeyTypeDescriptionExample
Base URLstringAIOS URL for API requestsAIOS LLM Private Endpoint
Request MethodstringHTTP methods used in API requestsPOST
HeadersobjectHeader information required for the request{ “Content-Type”: “application/json” }
Body ParametersobjectParameters included in the request body{ “model”: “sds/bge-m3”, “query”: …, “documents”: […] }
Table. Re-rank API - Context

Path Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Re-rank API - Path Parameters

Query Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Re-rank API - Query Parameters

Body Parameters

NameName SubtypeRequiredDescriptionDefault valueBoundary valueExample
model-stringSpecify the model to use for response generation“sds/bge-reranker-v2-m3”
query-stringUser’s search query or question“What is the capital of France?"
documents-arrayList of documents to be reorderedMaximum model input length limit[“The capital of France is Paris.”]
top_n-integerSpecify the number of parent documents to return (0 returns all)0> 05
truncate_prompt_tokens-integerLimit the number of input tokens> 0100
Table. Re-rank API - Body Parameters

Example

Color mode
curl -X "POST" \
   {AIOS LLM private endpoint}/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sds/bge-reranker-v2-m3",
    "query": "What is the capital of France?",
    "documents": [
      "The capital of France is Paris.",
      "France capital city is known for the Eiffel Tower.",
      "Paris is located in the north-central part of France."
    ],
    "top_n": 2, 
    "truncate_prompt_tokens": 512
  }'
curl -X "POST" \
   {AIOS LLM private endpoint}/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sds/bge-reranker-v2-m3",
    "query": "What is the capital of France?",
    "documents": [
      "The capital of France is Paris.",
      "France capital city is known for the Eiffel Tower.",
      "Paris is located in the north-central part of France."
    ],
    "top_n": 2, 
    "truncate_prompt_tokens": 512
  }'
Code block. Re-Rank API Request Example

Response

200 OK

NameTypeDescription
idstringUnique identifier of the API response (UUID format)
modelstringName of the model that generated the result
usageintegerObject containing resource information used in the request
usage.total_tokensintegerTotal number of tokens used for request processing
resultstringAn array containing the results of documents related to the query
results[].indexintegerThe index number within the result array
results[].documentobjectAn object containing the contents of the retrieved document
results[].document.textstringThe actual text content of the retrieved document
results[].relevance_scorefloatScore indicating the relevance between the query and the document (0 ~ 1)
Table. Re-rank API - 200 OK

Error Code

HTTP status codeErrorCode description
400Bad Request
422Validation Error
500Internal Server Error
Table. Re-rank API - Error Code

Example

Color mode
{
  "id": "rerank-scp-aios-rerank",
  "model": "sds/sds/bge-m3",
  "usage": {
    "total_tokens": 65
  },
  "results": [
    {
      "index": 0,
      "document": {
        "text": "The capital of France is Paris."
      },
      "relevance_score": 0.8291233777999878
    },
    {
      "index": 1,
      "document": {
        "text": "France capital city is known for the Eiffel Tower."
      },
      "relevance_score": 0.6996355652809143
    }
  ]
}
{
  "id": "rerank-scp-aios-rerank",
  "model": "sds/sds/bge-m3",
  "usage": {
    "total_tokens": 65
  },
  "results": [
    {
      "index": 0,
      "document": {
        "text": "The capital of France is Paris."
      },
      "relevance_score": 0.8291233777999878
    },
    {
      "index": 1,
      "document": {
        "text": "France capital city is known for the Eiffel Tower."
      },
      "relevance_score": 0.6996355652809143
    }
  ]
}
Code block. Re-Rank API Response Example

Reference

Score API

POST /score, /v1/score

Overview

The Score API predicts the similarity between two sentences. This API calculates the score using one of two models.

  • Reranker(Cross-Encoder) model: It takes a pair of sentences as input and directly predicts similarity scores.
  • Embedding model: After generating embedding vectors for each sentence, compute the cosine similarity (Cosine similarity) to derive a score.

Request

Context

KeyTypeDescriptionExample
Base URLstringAIOS URL for API requestsAIOS LLM Private Endpoint
Request MethodstringHTTP methods used in API requestsPOST
HeadersobjectHeader information required for the request{ “Content-Type”: “application/json” }
Body ParametersobjectParameters included in the request body{ “model”: “sds/bge-reranker-v2-m3”, “text_1”: […], “text_2”: […] }
Table. Score API - Context

Path Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Score API - Path Parameters

Query Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Score API - Query Parameters

Body Parameters

NameName SubtypeRequiredDescriptionDefault valueBoundary valueExample
model-stringSpecify the model to use for response generation“sds/bge-reranker-v2-m3”
encoding_format-stringScore return formatfloat
  • “float”(default)
  • “int”
“float”
text_1-string, arrayFirst text to compare
  • string ("")
  • maximum input length limit of the model
“What is the capital of France?"
text_2-string, arraySecond text to compare
  • string (”")
  • maximum input length limit of the model
[“The capital of France is Paris.”, ]
truncate_prompt_tokens-integerLimit the number of input tokens> 0100
Table. Score API - Body Parameters

Example

Color mode
curl -X "POST" \
  {AIOS LLM private endpoint}/score
  -H "Content-Type: application/json" \
  -d '{
  "model": "sds/bge-reranker-v2-m3",
  "encoding_format": "float",
"text_1": [
  What is the largest planet in the solar system?
  What is the chemical symbol for water?
],
"text_2": [
  Jupiter is the largest planet in the solar system.
  The chemical formula of water is H₂O.
]
}'
curl -X "POST" \
  {AIOS LLM private endpoint}/score
  -H "Content-Type: application/json" \
  -d '{
  "model": "sds/bge-reranker-v2-m3",
  "encoding_format": "float",
"text_1": [
  What is the largest planet in the solar system?
  What is the chemical symbol for water?
],
"text_2": [
  Jupiter is the largest planet in the solar system.
  The chemical formula of water is H₂O.
]
}'
Code block. Score API Request Example

Response

200 OK

NameTypeDescription
idstringUnique identifier of the response
objectstringResponse object’s type (example: “list” )
createdintegerCreation time (Unix timestamp, in seconds)
modelstringName of the model used
dataarrayScore Calculation Result List
data.indexintegerIndex of the item in the data array
data.objectstringData item type (example: “score”)
data.scorenumberCalculated score value, normalized to a range of 0 to 1.
usageobjectToken usage statistics
usage.prompt_tokensintegerNumber of tokens used in the input prompt
usage.total_tokensintegerTotal token count (input + output)
usage.completion_tokensintegerNumber of tokens used in the generated response
usage.prompt_tokens_detailsnullPrompt token details
Table. Score API - 200 OK

Error Code

HTTP status codeErrorCode description
400Bad Request
422Validation Error
500Internal Server Error
Table. Score API - Error Code

Example

Color mode
{
  "id": "score-scp-aios-score",
  "object": "list",
  "created": 1748574112,
  "model": "sds/bge-reranker-v2-m3",
  "data": [
    {
      "index": 0,
      "object": "score",
      "score": 1.0
    },
    {
      "index": 1,
      "object": "score",
      "score": 1.0
    }
  ],
  "usage": {
    "prompt_tokens": 53,
    "total_tokens": 53,
    "completion_tokens": 0,
    "prompt_tokens_details": null
  }
}
{
  "id": "score-scp-aios-score",
  "object": "list",
  "created": 1748574112,
  "model": "sds/bge-reranker-v2-m3",
  "data": [
    {
      "index": 0,
      "object": "score",
      "score": 1.0
    },
    {
      "index": 1,
      "object": "score",
      "score": 1.0
    }
  ],
  "usage": {
    "prompt_tokens": 53,
    "total_tokens": 53,
    "completion_tokens": 0,
    "prompt_tokens_details": null
  }
}
code block. Score API Response Example

Reference

Chat Completions API

POST /v1/chat/completions

Overview

The Chat Completions API is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.

Request

Context

KeyTypeDescriptionExample
Base URLstringAIOS URL for API requestsAIOS LLM Private Endpoint
Request MethodstringHTTP methods used in API requestsPOST
HeadersobjectHeader information required for the request{ “Content-Type”: “application/json” }
Body ParametersobjectParameters included in the request body{“model”: “meta-llama/Llama-3.3-70B-Instruct”, “messages” [{“role”: “user”, “content”: “hello”}], “stream”: true }
Table. Chat Completions API - Context

Path Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Chat Completions API - Path Parameters

Query Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Chat Completions API - Query Parameters

Body Parameters

NameName SubtypeRequiredDescriptionDefault valueBoundary valueExample
model-stringSpecify the model to use for response generation“meta-llama/Llama-3.3-70B-Instruct”
messagesrolestringMessage list containing conversation history[ { “role” : “user” , “content” : “message” }]
frequency_penalty-numberAdjust the penalty for repeated tokens0-2.0 ~ 2.00.5
logit_bias-objectAdjust the probability of a specific token (example: { “100”: 2.0 })nullKey: Token ID, Value: -100 ~ 100{ “100”: 2.0 }
logprobs-booleanReturns token probabilities for the top logprobs countfalsetrue, falsetrue
max_completion_tokens-integerLimit the maximum number of generated tokensNone0 ~ model maximum value100
max_tokens (Deprecated)-integerLimit the maximum number of generated tokensNone0 ~ model maximum value100
n-integerSpecify the number of responses to generate13
presence_penalty-numberAdjust the penalty for tokens contained in the existing text.0-2.0 ~ 2.01.0
seed-integerSpecify the seed value for controlling randomnessNone
stop-string / array / nullStop generation when a specific string appears.null"\n"
stream-booleanWhether to return results in streaming modefalsetrue/falsetrue
stream_optionsinclude_usage, continuous_usage_statsobjectControl streaming options (e.g., whether to include usage statistics)null{ “include_usage”: true }
temperature-numberAdjust the creativity of the generated output (higher values are more random)10.0 ~ 1.00.7
tool_choice-stringAdjust which Tool is invoked by the model
  • none: Do not invoke any Tool
  • auto: Let the model choose whether to generate a message or invoke a Tool
  • required: The model must invoke one or more Tools
  • when there is no tool: none
  • when there is a tool: auto
tools-arrayList of tools the model can invoke
  • Only functions are supported as tools
  • Supports up to 128 functions
None
top_logprobs-integerSpecify the number of most probable tokens as an integer between 0 and 20
  • Each is associated with a log probability value
  • logprobs must be set to true
  • Shows the probability values for the top k of completions
None0 ~ 203
top_p-numberLimit the sampling probability of tokens (higher values consider more tokens)10.0 ~ 1.00.9
Table. Chat Completions API - Body Parameters

Example

Color mode
curl -X "POST" \
   {AIOS LLM private endpoint}/v1/chat/completions
  -H "Content-Type: application/json" \
  -d '{
    "model": "/mnt/models/Meta-Llama-3.3-70B-Instruct",
      "messages": [
      {
        "role": "assistant",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the capital of Korea?"
      }
    ]
}'
curl -X "POST" \
   {AIOS LLM private endpoint}/v1/chat/completions
  -H "Content-Type: application/json" \
  -d '{
    "model": "/mnt/models/Meta-Llama-3.3-70B-Instruct",
      "messages": [
      {
        "role": "assistant",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is the capital of Korea?"
      }
    ]
}'
Code block. CompChat Completionsletions API Request Example

Response

200 OK

NameTypeDescription
idstringunique identifier of the response
objectstringResponse object’s type (example: “chat.completion”)
createdintegerCreation time (Unix timestamp, in seconds)
modelstringName of the model used
choicesarrayList of generated response options
choices[].indexintegerThe index of the corresponding choice
choices[].messageobjectGenerated message object
choices[].message.rolestringThe role of the message author (e.g., “assistant”)
choices[].message.contentstringThe actual content of the generated message
choices[].message.reasoning_contentstringThe actual content of the generated inference message
choices[].message.tool_callsarray (optional)Tool invocation information (may be included depending on model/settings)
choices[].finish_reasonstring or nullReason the response was terminated (e.g., “stop”, “length”, etc.)
choices[].stop_reasonobject or nullAdditional stop reason details
choices[].logprobsobject or nullLog probability information per token (included depending on settings)
usageobjectToken usage statistics
usage.prompt_tokensintegerNumber of tokens used in the input prompt
usage.completion_tokensintegerNumber of tokens used in the generated response
usage.total_tokensintegerTotal token count (input + output)
Table. Chat Completions API - 200 OK

Error Code

HTTP status codeErrorCode description
400Bad Request
422Validation Error
500Internal Server Error
Table. Chat Completions API - Error Code

Example

Color mode
{
  "id": "chatcmpl-scp-aios-chat-completions",
  "object": "chat.completion",
  "created": 1749702816,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "The capital of South Korea is Seoul."
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 54,
    "total_tokens": 62,
    "completion_tokens": 8,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}
{
  "id": "chatcmpl-scp-aios-chat-completions",
  "object": "chat.completion",
  "created": 1749702816,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "The capital of South Korea is Seoul."
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 54,
    "total_tokens": 62,
    "completion_tokens": 8,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}
code block. Chat Completions API Response Example

Reference

Completions API

POST /v1/completions

Overview

The Completions API is compatible with OpenAI’s Completions API and can be used with the OpenAI Python client.

Request

Context

KeyTypeDescriptionExample
Base URLstringAIOS URL for API requestsAIOS LLM Private Endpoint
Request MethodstringHTTP methods used in API requestsPOST
HeadersobjectHeader information required for the request{ “Content-Type”: “application/json” }
Body ParametersobjectParameters included in the request body{“model”: “meta-llama/Llama-3.3-70B-Instruct”, “prompt” : “hello”, “stream”: true }
Table. Completions API - Context

Path Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Completions API - Path Parameters

Query Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Completions API - Query Parameters

Body Parameters

NameName SubtypeRequiredDescriptionDefault valueBoundary valueExample
model-stringSpecify the model to use for generating responses“meta-llama/Llama-3.3-70B-Instruct”
prompt-array, stringUser input text""
echo-booleanWhether to include the input text in the outputfalsetrue/falsetrue
frequency_penalty-numberAdjust the penalty for repeated tokens0-2.0 ~ 2.00.5
logit_bias-objectAdjust the probability of a specific token (example: { “100”: 2.0 })nullKey: Token ID, Value: -100~100{ “100”: 2.0 }
logprobs-integerReturns token probabilities for the top logprobs countnull1 ~ 55
max_completion_tokens-integerLimit the maximum number of generated tokensNone0~model maximum value100
max_tokens (Deprecated)-integerLimit the maximum number of generated tokensNone0~model maximum value100
n-integerSpecify the number of responses to generate13
presence_penalty-numberAdjust the penalty for tokens in the existing text.0-2.0 ~ 2.01.0
seed-integerSpecify a seed value for controlling randomnessNone
stop-string / array / nullStop generation when a specific string appears.null"\n"
stream-booleanWhether to return results in streaming modefalsetrue/falsetrue
stream_optionsinclude_usage, continuous_usage_statsobjectControl streaming options (e.g., whether to include usage statistics)null{ “include_usage”: true }
temperature-numberAdjust the creativity of the generation result (higher values are more random)10.0 ~ 1.00.7
top_p-numberLimit the sampling probability of tokens (higher values consider more tokens)10.0 ~ 1.00.9
Table. Completions API - Body Parameters

Example

Color mode
curl -X "POST" \
   {AIOS LLM Private Endpoint}/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
    "prompt": "What is the capital of South Korea?"
    "temperature": 0.7
  }'
curl -X "POST" \
   {AIOS LLM Private Endpoint}/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
    "prompt": "What is the capital of South Korea?"
    "temperature": 0.7
  }'
code block. Completions API Request Example

Response

200 OK

NameTypeDescription
idstringUnique identifier of the response
objectstringResponse object’s type (e.g., “text_completion”)
createdintegerCreation time (Unix timestamp, in seconds)
modelstringName of the model used
choicesarrayList of generated response options
choices[].indexnumberThe index of the corresponding choice
choices[].textstringGenerated text object
choices[].logprobsobjectLog probability information per token (included depending on settings)
choices[].finish_reasonstring or nullReason the response was terminated (e.g., “stop”, “length”, etc.)
choices[].stop_reasonobject or nullAdditional stop reason details
choices[].prompt_logprobsobject or nullLog probability per input prompt token (null allowed)
usageobjectToken usage statistics
usage.prompt_tokensnumberNumber of tokens used in the input prompt
usage.total_tokensnumberTotal token count (input + output)
usage.completion_tokensnumberNumber of tokens used in the generated response
usage.prompt_tokens_detailsobjectPrompt token usage details
Table. Completions API - 200 OK

Error Code

HTTP status codeErrorCode description
400Bad Request
422Validation Error
500Internal Server Error
Table. Completions API - Error Code

Example

Color mode
{
  "id": "cmpl-scp-aios-completions",
  "object": "text_completion",
  "created": 1749702612,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "text": " \nOur capital city is Seoul. \n\nA. 1\nB. ",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "prompt_logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 25,
    "completion_tokens": 16,
    "prompt_tokens_details": null
  }
}
{
  "id": "cmpl-scp-aios-completions",
  "object": "text_completion",
  "created": 1749702612,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "text": " \nOur capital city is Seoul. \n\nA. 1\nB. ",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "prompt_logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 25,
    "completion_tokens": 16,
    "prompt_tokens_details": null
  }
}
code block. Completions API Response Example

Reference

Embedding API

POST /v1/embeddings

Overview

The Embedding API converts text into high‑dimensional vectors (embeddings), which can be used for various natural language processing (NLP) tasks such as similarity calculation between texts, clustering, and search.

Request

Context

KeyTypeDescriptionExample
Base URLstringAIOS URL for API requestsapplication/json
Request MethodstringHTTP methods used in API requestsPOST
HeadersobjectHeader information required for the request{ “accept”: “application/json”, “Content-Type”: “application/json” }
Body ParametersobjectParameters included in the request body{ “model”: “sds/bge-m3”, “input”: “What is the capital of France?”}
Table. Embedding API - Context

Path Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Embedding API - Path Parameters

Query Parameters

NametypeRequiredDescriptionDefault valueBoundary valueExample
None
Table. Embedding API - Query Parameters

Body Parameters

NameName SubtypeRequiredDescriptionDefault valueBoundary valueExample
model-stringSpecify the model to use for generating responses“sds/bge-reranker-v2-m3”
input-array<stringUser’s search query or question“What is the capital of France?"
encoding_format-stringSpecify the format to return the embeddingfloat“float”, “base64”[0.01319122314453125,0.057220458984375, … (omitted)
truncate_prompt_tokens-integerLimit the number of input tokens> 0100
Table. Embedding API - Body Parameters

Example

Color mode
curl -X "POST" \
   {AIOS LLM Private Endpoint}/v1/embedding \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sds/bge-m3",
    "input": "What is the capital of France?",
	"encoding_format": "float"
  }'
curl -X "POST" \
   {AIOS LLM Private Endpoint}/v1/embedding \
  -H "Content-Type: application/json" \
  -d '{
    "model": "sds/bge-m3",
    "input": "What is the capital of France?",
	"encoding_format": "float"
  }'
Code block. Embedding API Request Example

Response

200 OK

NameTypeDescription
idstringUnique identifier of the response
objectstringResponse object’s type (example: “list” )
creatednumberCreation time (Unix timestamp, in seconds)
modelstringName of the model used
dataarrayArray of objects containing embedding results
data.indexnumberOrder index of the input text (example: indicates the order when multiple input texts are provided)
data.objectstringData item type
data.embeddingarrayEmbedding vector values of the input text (sds-bge-m3 consists of a 1024-dimensional float array)
usageobjectToken usage statistics
usage.prompt_tokensnumberNumber of tokens used in the input prompt
usage.total_tokensnumberTotal token count (input + output)
usage.completion_tokensnumberNumber of tokens used in the generated response
usage.prompt_tokens_detailsobjectPrompt token details
Table. Embedding API - 200 OK

Error Code

HTTP status codeErrorCode description
400Bad Request
422Validation Error
500Internal Server Error
Table. Embedding API - Error Code

Example

Color mode
{
  "id":"embd-scp-aios-embeddings",
  "object":"list","created":1749035024,
  "model":"sds/bge-m3",
  "data":[
    {
      "index":0,
      "object":"embedding",
      "embedding":
      [0.01319122314453125,0.057220458984375,-0.028533935546875,-0.0008697509765625,-0.01422119140625,0.033416748046875,-0.0062408447265625,-0.04364013671875,-0.004497528076171875,0.0008072853088378906,-0.0193328857421875,0.041168212890625,-0.019317626953125,-0.0188751220703125,-0.047088623046875,
      -0 ....(omitted)

      -0.05706787109375,-0.0147705078125]
    }
  ],
  "usage":
  {
    "prompt_tokens":9,
    "total_tokens":9,
    "completion_tokens":0,
    "prompt_tokens_details":null
  }
}
{
  "id":"embd-scp-aios-embeddings",
  "object":"list","created":1749035024,
  "model":"sds/bge-m3",
  "data":[
    {
      "index":0,
      "object":"embedding",
      "embedding":
      [0.01319122314453125,0.057220458984375,-0.028533935546875,-0.0008697509765625,-0.01422119140625,0.033416748046875,-0.0062408447265625,-0.04364013671875,-0.004497528076171875,0.0008072853088378906,-0.0193328857421875,0.041168212890625,-0.019317626953125,-0.0188751220703125,-0.047088623046875,
      -0 ....(omitted)

      -0.05706787109375,-0.0147705078125]
    }
  ],
  "usage":
  {
    "prompt_tokens":9,
    "total_tokens":9,
    "completion_tokens":0,
    "prompt_tokens_details":null
  }
}
Code block. Embedding API Response Example

Reference

References
Tutorial