이 섹션의 다중 페이지 출력 화면임. 여기를 클릭하여 프린트.

API Reference

API Reference 개요

AIOS에서 지원하는 API Reference는 다음과 같습니다.

API명	API	상세 설명
Rerank API	POST /rerank, /v1/rerank, /v2/rerank	임베딩 모델이나 크로스 인코더 모델을 적용하여 단일 쿼리와 문서 목록의 각 항목 간 관련성을 예측합니다.
Score API	POST /score, /v1/score	두 문장의 유사도를 예측합니다.
Chat Completions API	POST /v1/chat/completions	OpenAI의 Completions API와 호환되며 OpenAI Python client에서 사용할 수 있습니다.
Completions API	POST /v1/completions	OpenAI의 Completions API와 호환되며 OpenAI Python client에서 사용할 수 있습니다.
Embedding API	POST /v1/embeddings	텍스트를 고차원 벡터(임베딩)로 변환하여, 텍스트 간 유사도 계산, 클러스터링, 검색 등 다양한 자연어 처리(NLP) 작업에 활용할 수 있습니다.

표. AIOS 지원 API 목록

Rerank API

POST /rerank, /v1/rerank, /v2/rerank

개요

Rerank API는 임베딩 모델이나 크로스 인코더 모델을 적용하여 단일 쿼리와 문서 목록의 각 항목 간 관련성을 예측합니다. 일반적으로 문장 쌍의 점수는 두 문장 간 유사도를 0에서 1 사이의 범위로 나타냅니다.

Embedding 기반 모델: Query와 문서를 각각 벡터로 바꾼 뒤, 벡터간의 유사도(예시: 코사인 유사도)를 측정하여 점수를 계산합니다.
Reranker(Cross-Encoder) 기반 모델: Query와 문서를 한쌍으로 모델에 넣어서 평가합니다.

Request

Context

Key	Type	Description	Example
Base URL	string	API 요청을 위한 AIOS URL	`AIOS LLM 프라이빗 엔드포인트`
Request Method	string	API 요청에 사용되는 HTTP 메서드	`POST`
Headers	object	요청 시 필요한 헤더 정보	`{ “Content-Type”: “application/json” }`
Body Parameters	object	요청 본문에 포함되는 파라미터	`{ “model”: “sds/bge-m3”, “query”: …, “documents”: […] }`

표. Re-rank API - Context

Path Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

표. Re-rank API - Path Parameters

Query Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

표. Re-rank API - Query Parameters

Body Parameters

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	응답 생성에 사용할 모델을 지정			`“sds/bge-reranker-v2-m3”`
query	-	string	✅	사용자의 검색 질의 또는 질문			`“What is the capital of France?"`
documents	-	array	✅	재정렬 대상인 문서 목록		최대 모델 입력 길이 제한	`[“The capital of France is Paris.”]`
top_n	-	integer	❌	반환할 상위 문서 개수를 지정(0이면 전체 반환)	0	> 0	`5`
truncate_prompt_tokens	-	integer	❌	입력 토큰 수를 제한		> 0	`100`

표. Re-rank API - Body Parameters

Example

배경색 변경

curl -X "POST" \
   {AIOS LLM 프라이빗 엔드포인트}/rerank \ 
  -H "Content-Type: application/json" \
  -d '{
    "model": "sds/bge-reranker-v2-m3",
    "query": "What is the capital of France?",
    "documents": [
      "The capital of France is Paris.",
      "France capital city is known for the Eiffel Tower.",
      "Paris is located in the north-central part of France."
    ],
    "top_n": 2, 
    "truncate_prompt_tokens": 512
  }'

curl -X "POST" \
   {AIOS LLM 프라이빗 엔드포인트}/rerank \ 
  -H "Content-Type: application/json" \
  -d '{
    "model": "sds/bge-reranker-v2-m3",
    "query": "What is the capital of France?",
    "documents": [
      "The capital of France is Paris.",
      "France capital city is known for the Eiffel Tower.",
      "Paris is located in the north-central part of France."
    ],
    "top_n": 2, 
    "truncate_prompt_tokens": 512
  }'

코드 블럭. Re-Rank API Request Example

Response

200 OK

Name	Type	Description
id	string	API 응답의 고유 식별자(UUID 형식)
model	string	결과를 생성한 모델의 이름
usage	integer	요청에 사용된 리소스 정보를 담은 객체
usage.total_tokens	integer	요청 처리에 사용된 총 토큰 수
result	string	쿼리와 관련된 문서들의 결과를 담은 배열
results[].index	integer	결과 배열 내의 순서 번호
results[].document	object	검색된 문서의 내용을 담은 객체
results[].document.text	string	검색된 문서의 실제 텍스트 내용
results[].relevance_score	float	쿼리와 문서 간의 관련성을 나타내는 점수(0 ~ 1)

표. Re-rank API - 200 OK

Error Code

HTTP status code	ErrorCode 설명
400	Bad Request
422	Validation Error
500	Internal Server Error

표. Re-rank API - Error Code

Example

배경색 변경

{
  "id": "rerank-scp-aios-rerank",
  "model": "sds/sds/bge-m3",
  "usage": {
    "total_tokens": 65
  },
  "results": [
    {
      "index": 0,
      "document": {
        "text": "The capital of France is Paris."
      },
      "relevance_score": 0.8291233777999878
    },
    {
      "index": 1,
      "document": {
        "text": "France capital city is known for the Eiffel Tower."
      },
      "relevance_score": 0.6996355652809143
    }
  ]
}

{
  "id": "rerank-scp-aios-rerank",
  "model": "sds/sds/bge-m3",
  "usage": {
    "total_tokens": 65
  },
  "results": [
    {
      "index": 0,
      "document": {
        "text": "The capital of France is Paris."
      },
      "relevance_score": 0.8291233777999878
    },
    {
      "index": 1,
      "document": {
        "text": "France capital city is known for the Eiffel Tower."
      },
      "relevance_score": 0.6996355652809143
    }
  ]
}

코드 블럭. Re-Rank API Response Example

참고

Rerank API vLLM 문서

Score API

POST /score, /v1/score

개요

Score API는 두 문장의 유사도를 예측합니다. 이 API는 두 가지 모델 중 하나를 사용하여 점수를 계산합니다

Reranker(Cross-Encoder) 모델: 문장 쌍을 입력으로 받아 직접 유사도 점수를 예측합니다.
Embedding 모델: 각 문장의 임베딩 벡터를 생성한 후, 코사인 유사도(Cosine similarity)를 계산하여 점수를 도출합니다.

Request

Context

Key	Type	Description	Example
Base URL	string	API 요청을 위한 AIOS URL	`AIOS LLM 프라이빗 엔드포인트`
Request Method	string	API 요청에 사용되는 HTTP 메서드	`POST`
Headers	object	요청 시 필요한 헤더 정보	`{ “Content-Type”: “application/json” }`
Body Parameters	object	요청 본문에 포함되는 파라미터	`{ “model”: “sds/bge-reranker-v2-m3”, “text_1”: […], “text_2”: […] }`

표. Score API - Context

Path Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

표. Score API - Path Parameters

Query Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

표. Score API - Query Parameters

Body Parameters

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	응답 생성에 사용할 모델을 지정			`“sds/bge-reranker-v2-m3”`
encoding_format	-	string	❌	점수 반환 형식	“float”	“float”(기본) “int”	`“float”`
text_1	-	string, array	✅	비교할 첫 번째 텍스트		문자열 ("") 모델의 최대 입력 길이 제한	`“What is the capital of France?"`
text_2	-	string, array	✅	비교할 두 번째 텍스트		문자열 (”") 모델의 최대 입력 길이 제한	`[“The capital of France is Paris.”, ]`
truncate_prompt_tokens	-	integer	❌	입력 토큰 수를 제한		> 0	`100`

표. Score API - Body Parameters

Example

배경색 변경

curl -X "POST" \
  {AIOS LLM 프라이빗 엔드포인트}/score \
  -H "Content-Type: application/json" \
  -d '{
  "model": "sds/bge-reranker-v2-m3",
  "encoding_format": "float",
"text_1": [
  "태양계에서 가장 큰 행성은 무엇인가요?",
  "물의 화학 기호는 무엇인가요?"
],
"text_2": [
  "목성은 태양계에서 가장 큰 행성입니다.",
  "물의 화학 기호는 H₂O입니다."
]
}'

curl -X "POST" \
  {AIOS LLM 프라이빗 엔드포인트}/score \
  -H "Content-Type: application/json" \
  -d '{
  "model": "sds/bge-reranker-v2-m3",
  "encoding_format": "float",
"text_1": [
  "태양계에서 가장 큰 행성은 무엇인가요?",
  "물의 화학 기호는 무엇인가요?"
],
"text_2": [
  "목성은 태양계에서 가장 큰 행성입니다.",
  "물의 화학 기호는 H₂O입니다."
]
}'

코드 블럭. Score API Request Example

Response

200 OK

Name	Type	Description
id	string	응답의 고유 식별자
object	string	응답 객체의 타입(예시: “list” )
created	integer	생성 시각(Unix timestamp, 초 단위)
model	string	사용된 모델의 이름
data	array	점수 계산 결과 목록
data.index	integer	데이터 배열 내 해당 항목의 인덱스
data.object	string	데이터 항목 타입(예시: “score”)
data.score	number	계산된 점수 값, 범위는 0 ~ 1로 정규화 값
usage	object	토큰 사용량 통계
usage.prompt_tokens	integer	입력 프롬프트에 사용된 토큰 수
usage.total_tokens	integer	전체 토큰 수(입력 + 출력)
usage.completion_tokens	integer	생성된 응답에 사용된 토큰 수
usage.prompt_tokens_details	null	프롬프트 토큰의 세부 정보

표. Score API - 200 OK

Error Code

HTTP status code	ErrorCode 설명
400	Bad Request
422	Validation Error
500	Internal Server Error

표. Score API - Error Code

Example

배경색 변경

{
  "id": "score-scp-aios-score",
  "object": "list",
  "created": 1748574112,
  "model": "sds/bge-reranker-v2-m3",
  "data": [
    {
      "index": 0,
      "object": "score",
      "score": 1.0
    },
    {
      "index": 1,
      "object": "score",
      "score": 1.0
    }
  ],
  "usage": {
    "prompt_tokens": 53,
    "total_tokens": 53,
    "completion_tokens": 0,
    "prompt_tokens_details": null
  }
}

{
  "id": "score-scp-aios-score",
  "object": "list",
  "created": 1748574112,
  "model": "sds/bge-reranker-v2-m3",
  "data": [
    {
      "index": 0,
      "object": "score",
      "score": 1.0
    },
    {
      "index": 1,
      "object": "score",
      "score": 1.0
    }
  ],
  "usage": {
    "prompt_tokens": 53,
    "total_tokens": 53,
    "completion_tokens": 0,
    "prompt_tokens_details": null
  }
}

코드 블럭. Score API Response Example

참고

Score API vLLM 문서

Chat Completions API

POST /v1/chat/completions

개요

Chat Completions API는 OpenAI의 Completions API와 호환되며 OpenAI Python client에서 사용할 수 있습니다.

Request

Context

Key	Type	Description	Example
Base URL	string	API 요청을 위한 AIOS URL	`AIOS LLM 프라이빗 엔드포인트`
Request Method	string	API 요청에 사용되는 HTTP 메서드	`POST`
Headers	object	요청 시 필요한 헤더 정보	`{ “Content-Type”: “application/json” }`
Body Parameters	object	요청 본문에 포함되는 파라미터	`{“model”: “meta-llama/Llama-3.3-70B-Instruct”, “messages” [{“role”: “user”, “content”: “hello”}], “stream”: true }`

표. Chat Completions API - Context

Path Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

표. Chat Completions API - Path Parameters

Query Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

표. Chat Completions API - Query Parameters

Body Parameters

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	응답 생성에 사용할 모델을 지정			`“meta-llama/Llama-3.3-70B-Instruct”`
messages	role	string	✅	대화 내역을 포함하는 메시지 리스트			`[ { “role” : “user” , “content” : “message” }]`
frequency_penalty	-	number	❌	반복되는 토큰에 대한 패널티를 조정	0	-2.0 ~ 2.0	`0.5`
logit_bias	-	object	❌	특정 토큰의 확률을 조정(예시: { “100”: 2.0 })	null	Key: 토큰 ID, Value: -100 ~ 100	`{ “100”: 2.0 }`
logprobs	-	boolean	❌	상위 logprobs 개수의 토큰 확률을 반환	false	true, false	`true`
max_completion_tokens	-	integer	❌	최대 생성 토큰 수를 제한	None	0 ~ 모델 최대값	`100`
max_tokens (Deprecated)	-	integer	❌	최대 생성 토큰 수를 제한	None	0 ~ 모델 최대값	`100`
n	-	integer	❌	생성할 응답 개수를 지정	1		`3`
presence_penalty	-	number	❌	기존 텍스트에 포함된 토큰에 대한 패널티를 조정	0	-2.0 ~ 2.0	`1.0`
seed	-	integer	❌	랜덤성 제어를 위한 시드 값을 지정	None
stop	-	string / array / null	❌	특정 문자열이 나타나면 생성을 중단	null		`"\n"`
stream	-	boolean	❌	스트리밍 방식으로 결과를 반환할지 여부	false	true/false	`true`
stream_options	include_usage, continuous_usage_stats	object	❌	스트리밍 옵션을 제어(예시: 사용량 통계 포함 여부)	null		`{ “include_usage”: true }`
temperature	-	number	❌	생성 결과의 창의성을 조절(높을수록 무작위)	1	0.0 ~ 1.0	`0.7`
tool_choice	-	string	❌	어떤 Tool이 모델에 의해 호출될지 조정 none: Tool을 호출하지 않음 auto: 모델이 메시지를 생성할지 Tool을 호출할지 선택 required: 모델이 1개 이상의 Tool을 호출	tool이 없을 때: none tool이 있을 때: auto
tools	-	array	❌	모델이 호출할수있는 Tool의 리스트 functions만 Tool로 지원 128 functions까지 지원	None
top_logprobs	-	integer	❌	0과 20사이의 정수 가장 확률이 높은 토큰의 수를 지정 각각은 log 확률값과 연관됨 logprobs가 true로 선택되어야 함 completions에 대한 top k에 대한 확률값을 보여 줌	None	0 ~ 20	`3`
top_p	-	number	❌	토큰의 샘플링 확률을 제한(높을수록 더 많은 토큰 고려)	1	0.0 ~ 1.0	`0.9`

표. Chat Completions API - Body Parameters

Example

배경색 변경

curl -X "POST" \
   {AIOS LLM 프라이빗 엔드포인트}/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/mnt/models/Meta-Llama-3.3-70B-Instruct",
      "messages": [
      {
        "role": "assistant",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "한국의 수도는 어디입니까?"
      }
    ]
}'

curl -X "POST" \
   {AIOS LLM 프라이빗 엔드포인트}/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/mnt/models/Meta-Llama-3.3-70B-Instruct",
      "messages": [
      {
        "role": "assistant",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "한국의 수도는 어디입니까?"
      }
    ]
}'

코드 블럭. CompChat Completionsletions API Request Example

Response

200 OK

Name	Type	Description
id	string	응답의 고유 식별자
object	string	응답 객체의 타입(예시: “chat.completion”)
created	integer	생성 시각(Unix timestamp, 초 단위)
model	string	사용된 모델의 이름
choices	array	생성된 응답 선택지 목록
choices[].index	integer	해당 choice의 인덱스
choices[].message	object	생성된 메시지 객체
choices[].message.role	string	메시지 작성자의 역할(예시: “assistant”)
choices[].message.content	string	생성된 메시지의 실제 내용
choices[].message.reasoning_content	string	생성된 추론 메시지의 실제 내용
choices[].message.tool_calls	array (optional)	도구 호출 정보(모델/설정에 따라 포함될 수 있음)
choices[].finish_reason	string or null	응답이 종료된 이유(예시: “stop”, “length” 등)
choices[].stop_reason	object or null	추가 중단 이유 세부 정보
choices[].logprobs	object or null	토큰 별 로그 확률 정보(설정에 따라 포함)
usage	object	토큰 사용량 통계
usage.prompt_tokens	integer	입력 프롬프트에 사용된 토큰 수
usage.completion_tokens	integer	생성된 응답에 사용된 토큰 수
usage.total_tokens	integer	전체 토큰 수(입력 + 출력)

표. Chat Completions API - 200 OK

Error Code

HTTP status code	ErrorCode 설명
400	Bad Request
422	Validation Error
500	Internal Server Error

표. Chat Completions API - Error Code

Example

배경색 변경

{
  "id": "chatcmpl-scp-aios-chat-completions",
  "object": "chat.completion",
  "created": 1749702816,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "한국의 수도는 서울입니다.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 54,
    "total_tokens": 62,
    "completion_tokens": 8,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

{
  "id": "chatcmpl-scp-aios-chat-completions",
  "object": "chat.completion",
  "created": 1749702816,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "한국의 수도는 서울입니다.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 54,
    "total_tokens": 62,
    "completion_tokens": 8,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

코드 블럭. Chat Completions API Response Example

참고

Completions API

POST /v1/completions

개요

Completions API는 OpenAI의 Completions API와 호환되며 OpenAI Python client에서 사용할 수 있습니다.

Request

Context

Key	Type	Description	Example
Base URL	string	API 요청을 위한 AIOS URL	`AIOS LLM 프라이빗 엔드포인트`
Request Method	string	API 요청에 사용되는 HTTP 메서드	`POST`
Headers	object	요청 시 필요한 헤더 정보	`{ “Content-Type”: “application/json” }`
Body Parameters	object	요청 본문에 포함되는 파라미터	`{“model”: “meta-llama/Llama-3.3-70B-Instruct”, “prompt” : “hello”, “stream”: true }`

표. Completions API - Context

Path Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

표. Completions API - Path Parameters

Query Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

표. Completions API - Query Parameters

Body Parameters

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	응답 생성에 사용할 모델을 지정			`“meta-llama/Llama-3.3-70B-Instruct”`
prompt	-	array, string	✅	사용자 입력 텍스트			`""`
echo	-	boolean	❌	입력 텍스트를 출력에 포함시킬지 여부	false	true/false	`true`
frequency_penalty	-	number	❌	반복되는 토큰에 대한 패널티를 조정	0	-2.0 ~ 2.0	`0.5`
logit_bias	-	object	❌	특정 토큰의 확률을 조정 (예시: { “100”: 2.0 })	null	Key: 토큰 ID, Value: -100~100	`{ “100”: 2.0 }`
logprobs	-	integer	❌	상위 logprobs 개수의 토큰 확률을 반환	null	1 ~ 5	`5`
max_completion_tokens	-	integer	❌	최대 생성 토큰 수를 제한	None	0~모델 최대 값	`100`
max_tokens (Deprecated)	-	integer	❌	최대 생성 토큰 수를 제한	None	0~모델 최대 값	`100`
n	-	integer	❌	생성할 응답 개수를 지정	1		`3`
presence_penalty	-	number	❌	기존 텍스트에 포함된 토큰에 대한 패널티를 조정	0	-2.0 ~ 2.0	`1.0`
seed	-	integer	❌	랜덤성 제어를 위한 시드값을 지정	None
stop	-	string / array / null	❌	특정 문자열이 나타나면 생성을 중단	null		`"\n"`
stream	-	boolean	❌	스트리밍 방식으로 결과를 반환할지 여부	false	true/false	`true`
stream_options	include_usage, continuous_usage_stats	object	❌	스트리밍 옵션을 제어 (예시: 사용량 통계 포함 여부)	null		`{ “include_usage”: true }`
temperature	-	number	❌	생성 결과의 창의성을 조절 (높을수록 무작위)	1	0.0 ~ 1.0	`0.7`
top_p	-	number	❌	토큰의 샘플링 확률을 제한 (높을수록 더 많은 토큰 고려)	1	0.0 ~ 1.0	`0.9`

표. Completions API - Body Parameters

Example

배경색 변경

curl -X "POST" \
   {AIOS LLM 프라이빗 엔드포인트}/v1/completions \ 
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
    "prompt": "한국의 수도는 어디입니까?",
    "temperature": 0.7
  }'

curl -X "POST" \
   {AIOS LLM 프라이빗 엔드포인트}/v1/completions \ 
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
    "prompt": "한국의 수도는 어디입니까?",
    "temperature": 0.7
  }'

코드 블럭. Completions API Request Example

Response

200 OK

Name	Type	Description
id	string	응답의 고유 식별자
object	string	응답 객체의 타입(예시: “text_completion”)
created	integer	생성 시각(Unix timestamp, 초 단위)
model	string	사용된 모델의 이름
choices	array	생성된 응답 선택지 목록
choices[].index	number	해당 choice의 인덱스
choices[].text	string	생성된 텍스트 객체
choices[].logprobs	object	토큰 별 로그 확률 정보(설정에 따라 포함)
choices[].finish_reason	string or null	응답이 종료된 이유(예시: “stop”, “length” 등)
choices[].stop_reason	object or null	추가 중단 이유 세부 정보
choices[].prompt_logprobs	object or null	입력 프롬프트 토큰별 로그 확률(널 가능)
usage	object	토큰 사용량 통계
usage.prompt_tokens	number	입력 프롬프트에 사용된 토큰 수
usage.total_tokens	number	전체 토큰 수(입력 + 출력)
usage.completion_tokens	number	생성된 응답에 사용된 토큰 수
usage.prompt_tokens_details	object	프롬프트 토큰 사용 세부 정보

표. Completions API - 200 OK

Error Code

HTTP status code	ErrorCode 설명
400	Bad Request
422	Validation Error
500	Internal Server Error

표. Completions API - Error Code

Example

배경색 변경

{
  "id": "cmpl-scp-aios-completions",
  "object": "text_completion",
  "created": 1749702612,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "text": " \nOur capital city is Seoul. \n\nA. 1\nB. ",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "prompt_logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 25,
    "completion_tokens": 16,
    "prompt_tokens_details": null
  }
}

{
  "id": "cmpl-scp-aios-completions",
  "object": "text_completion",
  "created": 1749702612,
  "model": "meta-llama/Meta-Llama-3.3-70B-Instruct",
  "choices": [
    {
      "index": 0,
      "text": " \nOur capital city is Seoul. \n\nA. 1\nB. ",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null,
      "prompt_logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 25,
    "completion_tokens": 16,
    "prompt_tokens_details": null
  }
}

코드 블럭. Completions API Response Example

참고

Embedding API

POST /v1/embeddings

개요

Embedding API는 텍스트를 고차원 벡터(임베딩)로 변환하여, 텍스트 간 유사도 계산, 클러스터링, 검색 등 다양한 자연어 처리(NLP) 작업에 활용할 수 있습니다.

Request

Context

Key	Type	Description	Example
Base URL	string	API 요청을 위한 AIOS URL	`application/json`
Request Method	string	API 요청에 사용되는 HTTP 메서드	`POST`
Headers	object	요청 시 필요한 헤더 정보	`{ “accept”: “application/json”, “Content-Type”: “application/json” }`
Body Parameters	object	요청 본문에 포함되는 파라미터	`{ “model”: “sds/bge-m3”, “input”: “What is the capital of France?”}`

표. Embedding API - Context

Path Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

표. Embedding API - Path Parameters

Query Parameters

Name	type	Required	Description	Default value	Boundary value	Example
None

표. Embedding API - Query Parameters

Body Parameters

Name	Name Sub	type	Required	Description	Default value	Boundary value	Example
model	-	string	✅	응답 생성에 사용할 모델을 지정			`“sds/bge-reranker-v2-m3”`
input	-	array<string	✅	사용자의 검색 질의 또는 질문			`“What is the capital of France?"`
encoding_format	-	string	❌	임베딩을 반환할 형식을 지정	“float”	“float”, “base64”	`[0.01319122314453125,0.057220458984375, … (생략)`
truncate_prompt_tokens	-	integer	❌	입력 토큰 수를 제한		> 0	`100`

표. Embedding API - Body Parameters

Example

배경색 변경

curl -X "POST" \
   {AIOS LLM 프라이빗 엔드포인트}/v1/embedding \ 
  -H "Content-Type: application/json" \
  -d '{
    "model": "sds/bge-m3",
    "input": "What is the capital of France?",
	"encoding_format": "float"
  }'

curl -X "POST" \
   {AIOS LLM 프라이빗 엔드포인트}/v1/embedding \ 
  -H "Content-Type: application/json" \
  -d '{
    "model": "sds/bge-m3",
    "input": "What is the capital of France?",
	"encoding_format": "float"
  }'

코드 블럭. Embedding API Request Example

Response

200 OK

Name	Type	Description
id	string	응답의 고유 식별자
object	string	응답 객체의 타입(예시: “list” )
created	number	생성 시각(Unix timestamp, 초 단위)
model	string	사용된 모델의 이름
data	array	임베딩 결과를 담은 객체 배열
data.index	number	입력 텍스트의 순서 인덱스 (예시: 입력 텍스트가 여러 개일 경우 순서를 나타냄)
data.object	string	데이터 항목 타입
data.embedding	array	입력 텍스트의 임베딩 벡터 값 (sds-bge-m3는 1024 차원의 float 배열로 구성)
usage	object	토큰 사용량 통계
usage.prompt_tokens	number	입력 프롬프트에 사용된 토큰 수
usage.total_tokens	number	전체 토큰 수(입력 + 출력)
usage.completion_tokens	number	생성된 응답에 사용된 토큰 수
usage.prompt_tokens_details	object	프롬프트 토큰의 세부 정보

표. Embedding API - 200 OK

Error Code

HTTP status code	ErrorCode 설명
400	Bad Request
422	Validation Error
500	Internal Server Error

표. Embedding API - Error Code

Example

배경색 변경

{
  "id":"embd-scp-aios-embeddings",
  "object":"list","created":1749035024,
  "model":"sds/bge-m3",
  "data":[
    {
      "index":0,
      "object":"embedding",
      "embedding":
      [0.01319122314453125,0.057220458984375,-0.028533935546875,-0.0008697509765625,-0.01422119140625,0.033416748046875,-0.0062408447265625,-0.04364013671875,-0.004497528076171875,0.0008072853088378906,-0.0193328857421875,0.041168212890625,-0.019317626953125,-0.0188751220703125,-0.047088623046875,
      -0 ....(생략)

      -0.05706787109375,-0.0147705078125]
    }
  ],
  "usage":
  {
    "prompt_tokens":9,
    "total_tokens":9,
    "completion_tokens":0,
    "prompt_tokens_details":null
  }
}

{
  "id":"embd-scp-aios-embeddings",
  "object":"list","created":1749035024,
  "model":"sds/bge-m3",
  "data":[
    {
      "index":0,
      "object":"embedding",
      "embedding":
      [0.01319122314453125,0.057220458984375,-0.028533935546875,-0.0008697509765625,-0.01422119140625,0.033416748046875,-0.0062408447265625,-0.04364013671875,-0.004497528076171875,0.0008072853088378906,-0.0193328857421875,0.041168212890625,-0.019317626953125,-0.0188751220703125,-0.047088623046875,
      -0 ....(생략)

      -0.05706787109375,-0.0147705078125]
    }
  ],
  "usage":
  {
    "prompt_tokens":9,
    "total_tokens":9,
    "completion_tokens":0,
    "prompt_tokens_details":null
  }
}

코드 블럭. Embedding API Response Example