This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

SDK Reference

SDK Reference Overview

AIOS models are compatible with OpenAI’s API, so they are also compatible with OpenAI’s SDK. The following is a list of OpenAI and Cohere compatible APIs supported by Samsung Cloud Platform AIOS service.

API Name	API	Detailed Description	Supported SDK
Text Completion API	/v1/completions langchain_openai.OpenAI	Generates a natural sentence that follows the given input string.	openai langchain-openai
Conversation Completion API	/v1/chat/completions langchain_openai.ChatOpenAI	Generates a response that follows the conversation content.	openai langchain-openai
Embeddings API	/v1/embeddings TogetherEmbeddings	Converts text into a high-dimensional vector (embedding) that can be used for various natural language processing (NLP) tasks such as text similarity calculation, clustering, and search.	openai langchain-together langchain-fireworks
Rerank API	/v2/rerank CohereRerank	Applies an embedding model or a cross-encoder model to predict the relevance between a single query and each item in a document list.	cohere langchain-cohere

Table. Python SDK Compatible API List

Note

The SDK Reference guide is based on a Virtual Server environment with Python installed.
The actual execution may differ from the example in terms of token count and message content.

OpenAI SDK

Installing the openai Package

Install the OpenAI package.

pip install openai

Text Completion API

The Text Completion API generates a natural sentence that follows the given input string.

/v1/completions

Request

Note

The Text Completion API can only use strings as input values.

from openai import OpenAI
from urllib.parse import urljoin

aios_base_url = "<<aios endpoint-url>>" # Enter the aios endpoint-url for AIOS model calls.
model = "<<model>>"                     # Enter the model ID for AIOS model calls.

client = OpenAI(base_url=urljoin(aios_base_url, "v1"), api_key="EMPTY_KEY")


response = client.completions.create(
  model=model,
  prompt="Hi"
)

Reference

The aios endpoint-url and model ID for model calls can be found in the LLM Endpoint Usage Guide on the resource details page. Refer to Using LLM.

Response

The text field in choices contains the model’s response.

Completion(
  id='cmpl-xxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
  choices=[
    CompletionChoice(
      finish_reason='length', 
      index=0, 
      logprobs=None, 
      text=' future president of the United States, I hope you’re doing well. As a', 
      stop_reason=None, 
      prompt_logprobs=None
    )
  ], 
  created=1750000000, 
  model='<<model>>', 
  object='text_completion',

stream request

stream can be used to receive the completed answer one by one, rather than receiving the entire answer at once, as the model generates tokens.

Request

Set the stream parameter value to True.

from openai import OpenAI
from urllib.parse import urljoin

aios_base_url = "<<aios endpoint-url>>" # AIOS model call endpoint-url to be input for AIOS model call
model = "<<model>>"                     # AIOS model call model ID to be input for AIOS model call

client = OpenAI(base_url=urljoin(aios_base_url, "v1"), api_key="EMPTY_KEY")

response = client.completions.create(
  model=model,
  prompt="Hi",
  stream=True
)

# Receive the response as the model generates tokens.
for chunk in response:
  print(chunk)

Response

Each token generates an answer, and each token can be checked in the choices’s text field.

Completion(
  id='cmpl-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
  choices=[
    CompletionChoice(
      finish_reason=None, 
      index=0, 
      logprobs=None, 
      text='.', 
      stop_reason=None
    )
  ], 
  created=1750000000, 
  model='<<model>>', 
  object='text_completion', 
  system_fingerprint=None, 
  usage=None
)
Completion(..., choices=[CompletionChoice(..., text=' I', ...)], ...)
Completion(..., choices=[CompletionChoice(..., text="'m", ...)], ...)
Completion(..., choices=[CompletionChoice(..., text=' looking', ...)], ...)
Completion(..., choices=[CompletionChoice(..., text=' for', ...)], ...)
Completion(..., choices=[CompletionChoice(..., text=' a', ...)], ...)
Completion(..., choices=[CompletionChoice(..., text=' way', ...)], ...)
Completion(..., choices=[CompletionChoice(..., text=' to', ...)], ...)
Completion(..., choices=[CompletionChoice(..., text=' check', ...)], ...)
Completion(..., choices=[CompletionChoice(..., text=' if', ...)], ...)
Completion(..., choices=[CompletionChoice(..., text=' a', ...)], ...)
Completion(..., choices=[CompletionChoice(..., text=' specific', ...)], ...)
Completion(..., choices=[CompletionChoice(..., text=' process', ...)], ...)
Completion(..., choices=[CompletionChoice(..., text=' is', ...)], ...)
Completion(..., choices=[CompletionChoice(..., text=' running', ...)], ...)
Completion(..., choices=[CompletionChoice(..., text=' on', ...)], ...)
Completion(..., choices=[], ..., 
  usage=CompletionUsage(
    completion_tokens=16, 
    prompt_tokens=2, 
    total_tokens=18, 
    completion_tokens_details=None, 
    prompt_tokens_details=None
  )
)

conversation completion API

The conversation completion API takes a list of messages in order as input and responds with a message that is suitable for the current context as the next order.

/v1/chat/completions

Request

Text message only, you can call as follows:

from openai import OpenAI
from urllib.parse import urljoin

aios_base_url = "<<aios endpoint-url>>" # AIOS model call for aios endpoint-url to enter.
model = "<<model>>"                     # AIOS model call for model ID to enter.

client = OpenAI(base_url=urljoin(aios_base_url, "v1"), api_key="EMPTY_KEY")

response = client.chat.completions.create(
  model=model,
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi"}
  ]
)

Note

Model call for aios endpoint-url and model ID information is provided in the resource details page’s LLM Endpoint usage guide. Please refer to Using LLM.

Response

You can check the model’s answer in the choices’s message.

ChatCompletion(
  id='chatcmpl-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
  choices=[
    Choice(
      finish_reason='stop', 
      index=0, 
      logprobs=None, 
      message=ChatCompletionMessage(
        content='Hello. How can I assist you today?', 
        refusal=None, 
        role='assistant', 
        annotations=None, 
        audio=None, 
        function_call=None, 
        tool_calls=[], 
        reasoning_content=None
      ), 
      stop_reason=None
    )
  ], 
  created=1750000000, 
  model='<<model>>', 
  object='chat.completion', 
  service_tier=None, 
  system_fingerprint=None, 
  usage=CompletionUsage(
    completion_tokens=10, 
    prompt_tokens=42, 
    total_tokens=52, 
    completion_tokens_details=None, 
    prompt_tokens_details=None
  ), 
  prompt_logprobs=None
)

Stream Request

Using stream, you can wait for the model to generate all answers and receive the response at once, or receive and process the response for each token generated by the model.

Request

Enter True as the value of the stream parameter.

from openai import OpenAI
from urllib.parse import urljoin

aios_base_url = "<<aios endpoint-url>>" # AIOS model call for aios endpoint-url to enter.
model = "<<model>>"                     # AIOS model call for model ID to enter.

client = OpenAI(base_url=urljoin(aios_base_url, "v1"), api_key="EMPTY_KEY")

response = client.chat.completions.create(
  model="meta-llama/Llama-3.3-70B-Instruct",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi"}
  ],
  stream=True
)

# You can receive a response each time the model generates a token.
for chunk in response:
  print(chunk)

Response

Each token generates a response, and each token can be checked in the choices field of the delta field.

ChatCompletionChunk(
  id='chatcmpl-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
  choices=[
    Choice(
      delta=ChoiceDelta(
        content='', 
        function_call=None, 
        refusal=None, 
        role='assistant', 
        tool_calls=None
      ), 
      finish_reason=None, 
      index=0, 
      logprobs=None
    )
  ], 
  created=1750000000, 
  model='<<model>>', 
  object='chat.completion.chunk', 
  service_tier=None, 
  system_fingerprint=None, 
  usage=None
)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content='It', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content="'s", ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' nice', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' to', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content='meet', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' you', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content='.', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' Is', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' there', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' something', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' I', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' can', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' help', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' you', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' with', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' or', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' would', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' you', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' like', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' to', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content=' chat', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content='?', ...), ...)], ...)
ChatCompletionChunk(..., choices=[Choice(delta=ChoiceDelta(content='', ...), ...)], ...)
ChatCompletionChunk(..., choices=[], ..., 
  usage=CompletionUsage(
    completion_tokens=23, 
    prompt_tokens=42, 
    total_tokens=65, 
    completion_tokens_details=None, 
    prompt_tokens_details=None
  )
)

Tool Calling

Tool calling refers to the interface of external tools defined outside the model, allowing the model to generate answers that can perform suitable tools in the current context.

Using tool call, you can define metadata for functions that the model can execute and utilize them to generate answers.

Request

from openai import OpenAI
from urllib.parse import urljoin

aios_base_url = "<<aios endpoint-url>>" # AIOS model call endpoint URL
model = "<<model>>"                     # AIOS model ID

client = OpenAI(base_url=urljoin(aios_base_url, "v1"), api_key="EMPTY_KEY")

# Function to get weather information
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current temperature for provided coordinates in celsius.",
        "parameters": {
            "type": "object",
            "properties": {
                "latitude": {"type": "number"},
                "longitude": {"type": "number"}
            },
            "required": ["latitude", "longitude"],
            "additionalProperties": False
        },
        "strict": True
    }
}]

messages = [{“role”: “user”, “content”: “What is the weather like in Paris today?”}]

response = client.chat.completions.create( model=model, messages=messages, tools=tools # Inform the model of the metadata of the tools that can be used. )

Response

choices’s message.tool_calls can be used to check how the model determines the execution method of the tool.

In the following example, you can see that the tool_calls’s function uses the get_weather function and checks what arguments should be inserted.

ChatCompletion(
  id='chatcmpl-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
  choices=[
    Choice(
      finish_reason='tool_calls', 
      index=0, 
      logprobs=None, 
      message=ChatCompletionMessage(
        content=None, 
        refusal=None, 
        role='assistant', 
        annotations=None, 
        audio=None, 
        function_call=None, 
        tool_calls=[
          ChatCompletionMessageToolCall(
            id='chatcmpl-tool-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
            function=Function(
              arguments='{"latitude": 48.8566, "longitude": 2.3522}', 
              name='get_weather'
            ), 
            type='function'
          )
        ], 
        reasoning_content=None
      ), 
      stop_reason=None
    )
  ], 
  created=1750000000, 
  model='<<model>>', 
  object='chat.completion', 
  service_tier=None, 
  system_fingerprint=None, 
  usage=CompletionUsage(
    completion_tokens=19, 
    prompt_tokens=194, 
    total_tokens=213, 
    completion_tokens_details=None, 
    prompt_tokens_details=None
  ), 
  prompt_logprobs=None
)

tool message

After adding the result value of the function as a tool message and generating the model’s response again, you can create an answer using the result value.

Request

Based on tool_calls’s function.arguments in the response data, you can actually call the function.

import json

# example function, always responds with 14 degrees.
def get_weather(latitude, longitude):
    return "14℃"

tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)

result = get_weather(args["latitude"], args["longitude"]) # "14℃"

After adding the result value of the function as a tool message to the conversation context and calling the model again,

the model can create an appropriate answer using the result value of the function.

# Add the model's tool call message to messages
messages.append(response.choices[0].message)
# Add the result of the actual function call to messages
messages.append({
  "role": "tool",
  "tool_call_id": tool_call.id,
  "content": str(result)
})

response_2 = client.chat.completions.create(
    model=model,
    messages=messages,
    # tools=tools

Response

ChatCompletion(
  id='chatcmpl-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
  choices=[
    Choice(
      finish_reason='stop', 
      index=0, 
      logprobs=None, 
      message=ChatCompletionMessage(
        content='The current weather in Paris is 14℃.', 
        refusal=None, 
        role='assistant', 
        annotations=None, 
        audio=None, 
        function_call=None, 
        tool_calls=[], 
        reasoning_content=None
      ), 
      stop_reason=None
    )
  ], 
  created=1750000000, 
  model='<<model>>', 
  object='chat.completion', 
  service_tier=None, 
  system_fingerprint=None, 
  usage=CompletionUsage(
    completion_tokens=11, 
    prompt_tokens=74, 
    total_tokens=85, 
    completion_tokens_details=None, 
    prompt_tokens_details=None
  ), 
  prompt_logprobs=None
)

Reasoning

Request

Reasoning is supported in models that provide a reasoning value, which can be checked as follows:

Note

Models that support reasoning may take longer to generate answers because they produce many tokens for reasoning.

from openai import OpenAI
from urllib.parse import urljoin

aios_base_url = "<<aios endpoint-url>>" # Enter the aios endpoint-url for AIOS model calls.
model = "<<model>>"                     # Enter the model ID for AIOS model calls.

client = OpenAI(base_url=urljoin(aios_base_url, "v1"), api_key="EMPTY_KEY")

response = client.chat.completions.create(
  model=model,
  messages=[
    {"role": "user", "content": "9.11 and 9.8, which is greater?"}
  ],
)

Response

The choices of the message field can be checked to see the content and also the reasoning_content, which provides the reasoning tokens.

ChatCompletion(
  id='chatcmpl-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
  choices=[
    Choice(
      finish_reason='stop', 
      index=0, 
      logprobs=None, 
      message=ChatCompletionMessage(
        content='''
        To determine whether 9.11 or 9.8 is larger, we compare the decimal parts since both numbers have the same whole number part (9).
        1. Convert both numbers to the same decimal places for easier comparison:
           - 9.11 remains as is.
           - 9.8 can be written as 9.80.
        2. Compare the tenths place:
           - The tenths place of 9.11 is 1.
           - The tenths place of 9.80 is 8.
        3. Since 8 (from 9.80) is greater than 1 (from 9.11), 9.80 (or 9.8) is larger.
        4. Verification by subtraction:
           - Subtracting 9.11 from 9.8 gives \(9.80 - 9.11 = 0.69\), which is positive, confirming 9.8 is larger.
        Thus, the larger number is \(\boxed{9.8}\).
        ''', 
        refusal=None, 
        role='assistant', 
        annotations=None, 
        audio=None, 
        function_call=None, 
        tool_calls=[], 
        reasoning_content="""Okay, so I need to figure out whether 9.11 is bigger than 9.8 or vice versa. 
        Hmm, let me think. Both numbers start with 9, so the whole number part is the same.

Here is the translation of the given text:

    That means the difference must be in the decimal parts. 
    First, I remember that when comparing decimals, you look at the digits one by one after the decimal point. 
    The first digit after the decimal is the tenths place, then hundredths, and so on. 
    Since both numbers have 9 in the units place, I can focus on the decimal parts: 0.11 versus 0.08.
    Wait, let me write them out to be clear. 9.11 is 9 and 11 hundredths, and 9.8 is 9 and 8 tenths. 
    Wait a second, maybe I should convert them to the same decimal places to make it easier. 
    So, 9.8 is the same as 9.80, right? Because adding a zero at the end of a decimal doesn't change its value. 
    That way, both numbers have two decimal places. Now comparing 9.11 and 9.80. 
    Looking at the tenths place first: 9.11 has 1 tenth, and 9.80 has 8 tenths. 
    Since 8 is more than 1, that means 9.80 is larger than 9.11. Wait, but hold on, the original numbers were 9.11 and 9.8. Did I make a mistake here?
    Let me double-check. 9.8 is indeed equal to 9.80. 
    So when comparing tenths, 8 tenths (from 9.80) is definitely more than 1 tenth (from 9.11). So that would mean 9.8 is bigger. 
    But wait, maybe I confused the decimal places. Let me think again. 
    The first number after the decimal is tenths, so in 9.11, the first digit is 1 (tenths place) and the second is 1 (hundredths). 
    In 9.8, the first digit is 8 (tenths place), and there's no hundredths, which is equivalent to 0. 
    So comparing the tenths place first: 1 vs 8. Since 8 is greater than 1, 9.8 is larger. 
    But why did I initially think maybe 9.11 was bigger? Maybe because 11 is a two-digit number? 
    But no, in decimals, each position is a different place value. 
    The tenths place is more significant than the hundredths. So even though 11 hundredths is 0.11, that's still less than 0.8 (which is 8 tenths). 
    Another way to think about it: Convert both to fractions. 
    9.11 is 9 + 11/100, and 9.8 is 9 + 8/10. 
    Converting 8/10 to hundredths, that's 80/100. 
    So 9.8 is 9 + 80/100, whereas 9.11 is 9 + 11/100. Comparing the fractions: 80/100 is more than 11/100, so 9.8 is bigger. 
    Alternatively, maybe subtract them to see the difference. 9.8 minus 9.11 is 0.69, which is positive, so 9.8 is larger. 
    Wait, let me do the subtraction step by step. 
    9.8 minus 9.11: 
    First, write them aligned by decimal:
     9.80
    -9.11
    -------
     0.69
    Yes, so the result is positive, so 9.8 is bigger. 
    Alternatively, if I had to do it without converting, maybe I can think in terms of money. 
    If these were dollars, 9.11 is $9.11 and 9.8 is $9.80. 
    Clearly, $9.80 is more than $9.11. 
    Hmm, so all these methods point to 9.8 being larger.
     Maybe my initial confusion was because I saw 11 as a two, but
    ...omitted...
    **Final Answer**
    The number 9.8 is larger than 9.11. This is because when comparing the decimal parts, 0.8 (from 9.8) is greater than 0.11 (from 9.11). 
    Specifically, 9.8 can be written as 9.80, and comparing the tenths place (8 vs. 1) shows that 9.8 is larger. 
    The difference between them is 0.69, confirming that 9.8 is indeed the larger number.
    **Final Answer**
    \\boxed{9.8}"""
  ), 
  stop_reason=None
)

], created=1750000000, model=’«model»’, object=‘chat.completion’, service_tier=None, system_fingerprint=None, usage=CompletionUsage( completion_tokens=4167, prompt_tokens=27, total_tokens=4194, completion_tokens_details=None, prompt_tokens_details=None ), prompt_logprobs=None, kv_transfer_params=None )


### image to text

**vision**을 지원하는 모델의 경우, 다음과 같이 이미지를 입력할 수 있습니다.

![dog image](../img/dog.jpg 'Figure. Input image') 




<div class="scp-textbox scp-textbox-type-error">
  <div class="scp-textbox-title">Note</div>
  <div class="scp-textbox-contents">
    <p>For models that support <strong>vision</strong>, there are limitations on the size and number of input images.</p>
<p>Please refer to <a href="/en/userguide/ai_ml/aios/overview/#provided-models">Provided Models</a> for more information on image input limitations.</p>

  </div>
</div>



#### Request

You can input an image with **MIME type** and **base64**.

```python
import base64
from openai import OpenAI
from urllib.parse import urljoin

aios_base_url = "<<aios endpoint-url>>" # AIOS endpoint-url for model calls
model = "<<model>>"                     # Model ID for AIOS model calls

client = OpenAI(base_url=urljoin(aios_base_url, "v1"), api_key="EMPTY_KEY")

image_path = "image/path.jpg"

def encode_image(image_path: str):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image(image_path)

response = client.chat.completions.create( model=model, messages=[ { “role”: “user”, “content”: [ {“type”: “text”, “text”: “what’s in this image?”}, { “type”: “image_url”, “image_url”: { “url”: f"data:image/jpeg;base64,{base64_image}", }, }, ] }, ], )

Response

The following is an analysis of the image to generate text.

ChatCompletion(
  id='chatcmpl-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
  choices=[
    Choice(
      finish_reason='stop', 
      index=0, 
      logprobs=None, 
      message=ChatCompletionMessage(
        content="""Here's what's in the image:
        *   **A golden retriever puppy:** The main subject is a light-colored golden retriever puppy lying on green grass.
        *   **A bone:** The puppy is holding a large bone in its paws and appears to be enjoying chewing on it.
        *   **Grass:** The puppy is lying on a well-maintained lawn.
        *   **Vegetation:** Behind the puppy, there are some shrubs and other greenery.
        *   **Outdoor setting:** The scene is outdoors, likely a backyard.""", 
        refusal=None, 
        role='assistant', 
        annotations=None, 
        audio=None, 
        function_call=None, 
        tool_calls=[], 
        reasoning_content=None
      ), 
      stop_reason=106
    )
  ], 
  created=1750000000, 
  model='<<model>>', 
  object='chat.completion', 
  service_tier=None, 
  system_fingerprint=None, 
  usage=CompletionUsage(
    completion_tokens=114, 
    prompt_tokens=276, 
    total_tokens=390, 
    completion_tokens_details=None, 
    prompt_tokens_details=None
  ), 
  prompt_logprobs=None, 
  kv_transfer_params=None
)

Embeddings API

Embeddings converts input text into a high-dimensional vector of a fixed dimension. The generated vector can be used for various natural language processing tasks such as text similarity, clustering, and search.

/v1/embeddings

Request

from openai import OpenAI
from urllib.parse import urljoin

aios_base_url = "<<aios endpoint-url>>" # AIOS endpoint-url for model calls
model = "<<model>>"                     # Model ID for AIOS model calls

client = OpenAI(base_url=urljoin(aios_base_url, "v1"), api_key="EMPTY_KEY")

response = client.embeddings.create(
    input="What is the capital of France?",
    model=model
)

Note

The aios endpoint-url and model ID for model calls can be found in the LLM Endpoint Usage Guide on the resource details page. Refer to Using LLM.

Response

data receives the converted value in vector form as a response.

CreateEmbeddingResponse(
  data=[
    Embedding(
      embedding=[
        0.01319122314453125, 
        0.057220458984375, 
        -0.028533935546875, 
        -0.0008697509765625, 
        -0.01422119140625,
        ...omitted...
      ], 
      index=0, 
      object='embedding'
    )
  ], 
  model='<<model>>', 
  object='list', 
  usage=Usage(
    prompt_tokens=9, 
    total_tokens=9, 
    completion_tokens=0, 
    prompt_tokens_details=None
  ), 
  id='embd-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
  created=1750000000
)

Cohere SDK

The Rerank API is compatible with the Cohere SDK.

Installing the Cohere Package

The Cohere SDK can be used by installing the Cohere package.

pip install cohere

Rerank API

Rerank calculates the relevance between the given query and documents, and ranks them. It can help improve the performance of RAG (Retrieval-Augmented Generation) structure applications by adjusting relevant documents to the front.

/v2/rerank

Request

import cohere
from urllib.parse import urljoin

aios_base_url = "<<aios endpoint-url>>" # Enter the aios endpoint-url for AIOS model calls.
model = "<<model>>"                     # Enter the model ID for AIOS model calls.

client = cohere.ClientV2("EMPTY_KEY", base_url=aios_base_url)

docs = [
  "The capital of France is Paris.",
  "France capital city is known for the Eiffel Tower.",
  "Paris is located in the north-central part of France."
]
  
response = client.rerank(
    model=model,
    query="What is the capital of France?",
    documents=docs,
    top_n=3,
)

Note

The aios endpoint-url and model ID information for model calls are provided in the LLM Endpoint Usage Guide on the resource details page. Refer to Using LLM.

Response

In results, you can check the documents sorted in order of relevance to the query.

V2RerankResponse(
  id='rerank-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 
  results=[
    V2RerankResponseResultsItem(
      document=V2RerankResponseResultsItemDocument(
        text='The capital of France is Paris.'
      ), 
      index=0, 
      relevance_score=1.0
    ), 
    V2RerankResponseResultsItem(

Here is the translated text:

  document=V2RerankResponseResultsItemDocument(
    text='France capital city is known for the Eiffel Tower.'
  ), 
  index=1, 
  relevance_score=1.0
), 
V2RerankResponseResultsItem(
  document=V2RerankResponseResultsItemDocument(
    text='Paris is located in the north-central part of France.'
  ), 
  index=2, 
  relevance_score=0.982421875
)

], meta=None, model=’«model»’, usage={ ’total_tokens’: 62 } )

Langchain SDK

Langchain’s SDK is also composed of OpenAI and Cohere SDKs, so you can use the Langchain SDK.

langchain package installation

The Langchain SDK can be used with the AIOS model after installing the langchain package.

pip install langchain langchain-openai langchain-cohere langchain-together

The langchain-openai package can be used to utilize the text completion API and conversation completion API.

langchain_openai.OpenAI

When the text completion model (langchain_openai.OpenAI) is invoked, the result value is generated as text.

Request

from langchain_openai import OpenAI
from urllib.parse import urljoin

aios_base_url = "<<aios endpoint-url>>" # Enter the aios endpoint-url for AIOS model calls.
model = "<<model>>"                     # Enter the model ID for AIOS model calls.

llm = OpenAI(
  base_url=urljoin(aios_base_url, "v1"), 
  api_key="EMPTY_KEY", 
  model=model
)

llm.invoke("Can you introduce yourself in 5 words?")

Response

"""Hi, I'm a fun artist!
...omitted..."""

Note

The aios endpoint-url and model ID information for model calls are provided in the LLM Endpoint Usage Guide on the resource details page. Refer to Using LLM.

langchain_openai.ChatOpenAI

When the conversation completion model (langchain_openai.ChatOpenAI) is invoked, the result value is generated as an AIMessage or Message object.

Request

from langchain_openai import ChatOpenAI
from urllib.parse import urljoin

aios_base_url = "<<aios endpoint-url>>" # Enter the aios endpoint-url for AIOS model calls.
model = "<<model>>"                     # Enter the model ID for AIOS model calls.

chat_llm = ChatOpenAI(
  base_url=urljoin(aios_base_url, "v1"), 
  api_key="EMPTY_KEY", 
  model=model
)

chat_completion = chat_llm.invoke("Can you introduce yourself in 5 words?")

chat_completion.pretty_print()

Note

Information for the aios endpoint-url and model ID for model invocation can be found in the LLM Endpoint usage guide on the resource details page. Please refer to Using LLM.

Response

================================== Ai Message ==================================

I am an AI assistant.

embeddings

Embeddings models such as langchain-together, langchain-fireworks can be used.

Request

from langchain_together import TogetherEmbeddings
from urllib.parse import urljoin

aios_base_url = "<<aios endpoint-url>>" # Enter the aios endpoint-url for AIOS model invocation.
model = "<<model>>"                     # Enter the model ID for AIOS model invocation.

embedding = TogetherEmbeddings(
  base_url=urljoin(aios_base_url, "v1"), 
  api_key="EMPTY_KEY", 
  model=model
)

embedding.embed_query("What is the capital of France?")

Note

Information for the aios endpoint-url and model ID for model invocation can be found in the LLM Endpoint usage guide on the resource details page. Please refer to Using LLM.

Response

[
  0.01319122314453125, 
  0.057220458984375, 
  -0.028533935546875, 
  -0.0008697509765625, 
  -0.01422119140625, 
  ...omitted...
]

rerank

Rerank models can utilize langchain-cohere’s CohereRerank.

Request

from langchain_cohere.rerank import CohereRerank

aios_base_url = "<<aios endpoint-url>>" # Enter the aios endpoint-url for AIOS model invocation.
model = "<<model>>"                     # Enter the model ID for AIOS model invocation.

rerank = CohereRerank(
  base_url=aios_base_url,
  cohere_api_key="EMPTY_KEY",
  model=model
)

docs = [
  "The capital of France is Paris.",
  "France capital city is known for the Eiffel Tower.",
  "Paris is located in the north-central part of France."
]

rerank.rerank(
  documents=docs, 
  query="What is the capital of France?",
  top_n=3
)

Note

Information for the aios endpoint-url and model ID for model invocation can be found in the LLM Endpoint usage guide on the resource details page. Please refer to Using LLM.

Response

[
  {'index': 0, 'relevance_score': 1.0}, 
  {'index': 1, 'relevance_score': 1.0}, 
  {'index': 2, 'relevance_score': 0.982421875}
]