Skip to main content
This document covers detailed usage examples and provider-specific references for each model class in AgentScope.

ChatModel

Text generation, streaming, reasoning, and tools API.

TTS Models

Non-realtime and realtime text-to-speech synthesis.

Realtime Models

Bidirectional WebSocket streaming for voice agents.

Embedding Models

Vector representations for retrieval and similarity search.
For core concepts and design principles, see Model. For details on Msg and content blocks, see Msg.

ChatModel

Basic Usage

All chat model classes share a unified __call__ interface. The input to __call__ is the formatted messages — the result of applying a formatter to Msg objects. This formatted input matches the exact format expected by the underlying API provider. Method signature:
async def __call__(
    self,
    messages: list[dict],
    tools: list[dict] | None = None,
    tool_choice: Literal["auto", "none", "required"] | str | None = None,
    structured_model: Type[BaseModel] | None = None,
    **kwargs: Any,
) -> ChatResponse | AsyncGenerator[ChatResponse, None]:
    """
    Call the chat model with formatted messages.

    Args:
        messages: Formatted messages (provider-specific format)
        tools: Optional tool schemas
        tool_choice: Tool invocation mode
        structured_model: Optional Pydantic model for structured output
        **kwargs: Additional provider-specific parameters

    Returns:
        - ChatResponse: when stream=False
        - AsyncGenerator[ChatResponse, None]: when stream=True
    """
Typical workflow when calling a model directly: In AgentScope, agents communicate by passing Msg objects. When calling a model directly (outside an agent), the typical flow is:
  1. Build Msg objects with name, role, and content (text or content blocks)
  2. Use a Formatter to convert [Msg] into the provider-specific message format
  3. Call the ChatModel with the formatted messages to get a ChatResponse
When using an agent (e.g., ReActAgent), steps 2-3 are handled automatically — the agent internally manages the Msg → Formatter → Model → ChatResponse pipeline. Example workflow:
import asyncio
import os
from agentscope.formatter import DashScopeChatFormatter
from agentscope.model import DashScopeChatModel
from agentscope.message import Msg

async def example_model_call():
    # Step 1: Create model and formatter
    model = DashScopeChatModel(
        model_name="qwen-max",
        api_key=os.environ["DASHSCOPE_API_KEY"],
        stream=False,
    )
    formatter = DashScopeChatFormatter()

    # Step 2: Build Msg objects
    user_msg = Msg(name="user", content="Hi!", role="user")

    # Step 3: Format messages (convert to provider-specific format)
    formatted_messages = await formatter.format([user_msg])

    # Step 4: Call model with formatted messages and get ChatResponse
    res = await model(formatted_messages)

    print("Response:", res.content)
    print("Usage:", res.usage)

asyncio.run(example_model_call())
The key point: ChatModel accepts formatted messages (the output of a formatter), not raw Msg objects. This design allows each model to receive input in its native API format. The model returns a ChatResponse object containing the generated content and usage information.

Streaming

To enable streaming, set stream=True in the constructor. When streaming is enabled, __call__ returns an async generator that yields ChatResponse instances.
Streaming in AgentScope is accumulative — each chunk contains all previous content plus newly generated content, not just the delta. This simplifies consumption since you always have the complete current state without tracking deltas.
async def example_streaming():
    model = DashScopeChatModel(
        model_name="qwen-max",
        api_key=os.environ["DASHSCOPE_API_KEY"],
        stream=True,
    )
    formatter = DashScopeChatFormatter()

    user_msg = Msg(name="user", content="Count from 1 to 5.", role="user")
    formatted_messages = await formatter.format([user_msg])

    # Get async generator
    generator = await model(formatted_messages)

    # Iterate through chunks (each contains accumulated content)
    async for chunk in generator:
        print(chunk.content)  # Accumulated content up to this point

asyncio.run(example_streaming())
Example output (each line shows accumulative text):
[{'type': 'text', 'text': '1'}]
[{'type': 'text', 'text': '1\n2'}]
[{'type': 'text', 'text': '1\n2\n3'}]
[{'type': 'text', 'text': '1\n2\n3\n4'}]
[{'type': 'text', 'text': '1\n2\n3\n4\n5'}]

Reasoning

AgentScope supports reasoning models (chain-of-thought) via ThinkingBlock. When enable_thinking=True, the model’s response includes both thinking process and final answer.
async def example_reasoning():
    model = DashScopeChatModel(
        model_name="qwen-turbo",
        api_key=os.environ["DASHSCOPE_API_KEY"],
        enable_thinking=True,  # Enable reasoning
        stream=True,
    )
    formatter = DashScopeChatFormatter()

    user_msg = Msg(name="user", content="What is 17 * 23?", role="user")
    formatted_messages = await formatter.format([user_msg])

    res = await model(formatted_messages)

    # Collect final chunk
    last_chunk = None
    async for chunk in res:
        last_chunk = chunk

    # Response contains both ThinkingBlock and TextBlock
    for block in last_chunk.content:
        block_type = block['type']
        content = block.get('thinking') or block.get('text')
        print(f"[{block_type}] {content[:80]}...")

asyncio.run(example_reasoning())
The thinking content is streamed alongside text content in accumulative mode.

Tools API

AgentScope provides a unified tools interface across all providers. Tools are defined using a standardized JSON schema format and passed to the model via the tools parameter.
async def example_tools():
    model = DashScopeChatModel(
        model_name="qwen-max",
        api_key=os.environ["DASHSCOPE_API_KEY"],
        stream=False,
    )
    formatter = DashScopeChatFormatter()

    # Define tool schema
    json_schemas = [
        {
            "type": "function",
            "function": {
                "name": "google_search",
                "description": "Search for a query on Google.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "query": {
                            "type": "string",
                            "description": "The search query.",
                        },
                    },
                    "required": ["query"],
                },
            },
        },
    ]

    user_msg = Msg(name="user", content="Search AgentScope release notes.", role="user")
    formatted_messages = await formatter.format([user_msg])

    # Call model with tools
    response = await model(
        messages=formatted_messages,
        tools=json_schemas,
        tool_choice="auto",  # "auto", "none", "required", or "<function_name>"
    )

    print(response.content)

asyncio.run(example_tools())
The tool_choice parameter controls invocation behavior:
  • "auto": Model decides whether to call a tool
  • "none": No tools will be called
  • "required": Model must call at least one tool
  • "<function_name>": Force a specific tool
Use the Toolkit class to auto-generate JSON schemas from Python functions with docstrings. See Tool for details.

Provider Reference

AgentScope supports multiple chat model providers. Each provider has a corresponding model class and formatter:
ProviderModel ClassFormatterKey Features
OpenAIOpenAIChatModelOpenAIChatFormatter / OpenAIChatMultiAgentFormatterSupports OpenAI, vLLM, DeepSeek, and OpenAI-compatible APIs
DashScopeDashScopeChatModelDashScopeChatFormatter / DashScopeMultiAgentFormatterSupports Qwen models, VL models, reasoning models
GeminiGeminiChatModelGeminiChatFormatter / GeminiMultiAgentFormatterGoogle Gemini models with multimodal support
AnthropicAnthropicChatModelAnthropicChatFormatter / AnthropicMultiAgentFormatterClaude models with extended thinking
OllamaOllamaChatModelOllamaChatFormatter / OllamaMultiAgentFormatterLocal LLM hosting
For detailed provider-specific parameters and examples, refer to the original documentation or source code.

Token Counting

AgentScope provides a token counter module under agentscope.token to estimate the number of tokens in a set of messages before sending them to a model. This is useful for managing context window budgets and implementing prompt truncation strategies.
The formatter module integrates token counters to support automatic prompt truncation. When a token budget is configured, the formatter uses the corresponding counter to trim messages before they are sent to the model.
Supported providers:
ProviderClassImage DataTools
AnthropicAnthropicTokenCounter
OpenAIOpenAITokenCounter
GeminiGeminiTokenCounter
HuggingFaceHuggingFaceTokenCounterDepends on the modelDepends on the model
DashScope does not provide a token-counting API. For DashScope (Qwen) models, use HuggingFaceTokenCounter with the corresponding Qwen tokenizer instead.
import asyncio
from agentscope.token import OpenAITokenCounter

async def example_token_counting():
    messages = [
        {"role": "user", "content": "Hello!"},
        {"role": "assistant", "content": "Hi, how can I help you?"},
    ]

    counter = OpenAITokenCounter(model_name="gpt-4.1")
    n_tokens = await counter.count(messages)

    print(f"Number of tokens: {n_tokens}")

asyncio.run(example_token_counting())

TTS Models

TTS (Text-to-Speech) models convert text into audio. AgentScope supports both non-realtime and realtime TTS models.

Non-Realtime TTS

Non-realtime TTS models require complete text before synthesis. The core method is synthesize(), which accepts a Msg object and returns a TTSResponse containing audio data.
async def synthesize(self, msg: Msg) -> TTSResponse | AsyncGenerator[TTSResponse, None]:
    """
    Synthesize speech from text.

    Args:
        msg: A Msg object containing text content

    Returns:
        - TTSResponse: when stream=False (complete audio)
        - AsyncGenerator[TTSResponse, None]: when stream=True (audio chunks)
    """
Basic usage:
import asyncio
import os
from agentscope.tts import DashScopeTTSModel
from agentscope.message import Msg

async def example_non_realtime_tts():
    tts_model = DashScopeTTSModel(
        api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
        model_name="qwen3-tts-flash",
        voice="Cherry",
        stream=False,
    )

    msg = Msg(name="assistant", content="Hello, this is a TTS demo.", role="assistant")
    tts_response = await tts_model.synthesize(msg)

    # tts_response.content contains an AudioBlock with base64-encoded audio
    print("Audio data length:", len(tts_response.content["source"]["data"]))

asyncio.run(example_non_realtime_tts())
Streaming output (stream=True) returns audio chunks progressively:
async def example_streaming_tts():
    tts_model = DashScopeTTSModel(
        api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
        model_name="qwen3-tts-flash",
        voice="Cherry",
        stream=True,
    )

    msg = Msg(name="assistant", content="Hello, streaming TTS.", role="assistant")

    async for tts_response in await tts_model.synthesize(msg):
        print("Received audio chunk:", len(tts_response.content["source"]["data"]))

asyncio.run(example_streaming_tts())

Realtime TTS

Realtime TTS models accept streaming text input — text chunks can be fed incrementally as they become available (e.g., from a streaming chat model). This enables the lowest possible latency. Core methods:
async def push(self, msg: Msg) -> TTSResponse:
    """
    Non-blocking. Submit text chunk and return any audio received so far.
    """

async def synthesize(self, msg: Msg) -> TTSResponse | AsyncGenerator[TTSResponse, None]:
    """
    Blocking. Finalize the session and return all remaining audio.
    """
Key concepts:
  • Stateful processing: Only one streaming session can be active at a time, identified by msg.id
  • Incremental input: Use push() to submit text chunks as they arrive
  • Finalization: Use synthesize() to complete the session and get remaining audio
Usage example:
from agentscope.tts import DashScopeRealtimeTTSModel

async def example_realtime_tts():
    tts_model = DashScopeRealtimeTTSModel(
        api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
        model_name="qwen3-tts-flash-realtime",
        voice="Cherry",
        stream=False,
    )

    async with tts_model:
        # Push accumulative text chunks (non-blocking)
        res = await tts_model.push(msg_chunk_1)
        res = await tts_model.push(msg_chunk_2)
        # ...
        # Finalize and get all remaining audio (blocking)
        res = await tts_model.synthesize(final_msg)
Integration with Agent: AgentScope agents can automatically synthesize speech when provided with a TTS model. The agent handles the streaming text → TTS pipeline internally.
from agentscope.agent import ReActAgent
from agentscope.formatter import DashScopeChatFormatter
from agentscope.model import DashScopeChatModel
from agentscope.tts import DashScopeRealtimeTTSModel

agent = ReActAgent(
    name="Assistant",
    sys_prompt="You are a helpful assistant.",
    model=DashScopeChatModel(
        api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
        model_name="qwen-max",
        stream=True,
    ),
    formatter=DashScopeChatFormatter(),
    tts_model=DashScopeRealtimeTTSModel(
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        model_name="qwen3-tts-flash-realtime",
        voice="Cherry",
    ),
)
When the agent generates streaming text responses, the TTS model automatically converts them to speech in real-time.

Realtime Models

Realtime models provide bidirectional, persistent communication over WebSocket, designed primarily for voice agent scenarios where the user speaks and the model responds with speech in real-time.

Principle

Realtime models maintain a persistent WebSocket connection that supports:
  • Bidirectional streaming: Audio/text input and audio/text output flow simultaneously
  • Low latency: Server-side VAD (Voice Activity Detection) enables natural turn-taking
  • Multimodal input: Audio, text, images (provider-dependent)
  • Tool support: Some providers support function calling in realtime (e.g., OpenAI, Gemini)
The key difference from traditional chat models is that realtime models handle the entire voice interaction pipeline (ASR + LLM + TTS) in a single, optimized connection, minimizing latency.

Usage with RealtimeAgent

AgentScope provides RealtimeAgent to work with realtime models. The agent handles the WebSocket connection, audio streaming, and message exchange automatically.
from agentscope.agent import RealtimeAgent
from agentscope.realtime import OpenAIRealtimeModel

# Create realtime model
realtime_model = OpenAIRealtimeModel(
    model_name="gpt-4o-realtime-preview",
    api_key=os.environ["OPENAI_API_KEY"],
    voice="alloy",
)

# Create realtime agent
agent = RealtimeAgent(
    name="VoiceAssistant",
    sys_prompt="You are a helpful voice assistant.",
    model=realtime_model,
)

# Start conversation (handles audio I/O automatically)
await agent.start()
The RealtimeAgent manages:
  • WebSocket connection lifecycle
  • Audio input/output streaming
  • Turn-taking and interruption handling
  • Tool execution (if supported by the model)
Supported providers:
ProviderModel ClassAudio I/OTool SupportVAD
OpenAIOpenAIRealtimeModel24kHz / 24kHzYesYes
DashScopeDashScopeRealtimeModel16kHz / 24kHzNoYes
GeminiGeminiRealtimeModel16kHz / 24kHzYesYes

Embedding Models

Embedding models generate vector representations for text, images, and other data types. These embeddings are used for retrieval, similarity search, and as input features for downstream tasks.

Core Method

All embedding models share a unified __call__ interface that accepts input data and returns an EmbeddingResponse:
async def __call__(
    self,
    inputs: List[str | TextBlock] | List[TextBlock | ImageBlock | VideoBlock],
    **kwargs,
) -> EmbeddingResponse:
    """
    Generate embeddings for input data.

    Args:
        inputs: Text strings, TextBlocks, or multimodal content blocks

    Returns:
        EmbeddingResponse containing:
        - embeddings: List of embedding vectors
        - usage: Token count and time information
        - source: "api" or "cache"
    """

Text Embedding

Text embedding models accept text strings or TextBlock objects:
import asyncio
import os
from agentscope.embedding import DashScopeTextEmbedding

async def example_text_embedding():
    embedding_model = DashScopeTextEmbedding(
        api_key=os.environ["DASHSCOPE_API_KEY"],
        model_name="text-embedding-v3",
    )

    # Embed text strings
    texts = ["Hello world", "AgentScope is awesome"]
    response = await embedding_model(texts)

    print(f"Generated {len(response.embeddings)} embeddings")
    print(f"Embedding dimension: {len(response.embeddings[0])}")
    print(f"Tokens used: {response.usage.tokens}")

asyncio.run(example_text_embedding())

Multimodal Embedding

Multimodal embedding models accept text, images, and videos using content blocks:
import asyncio
import os
from agentscope.embedding import DashScopeMultiModalEmbedding
from agentscope.message import TextBlock

async def example_multimodal_embedding():
    embedding_model = DashScopeMultiModalEmbedding(
        api_key=os.environ["DASHSCOPE_API_KEY"],
        model_name="multimodal-embedding-v1",
        dimensions=1024,
    )

    # Embed text content (multimodal model also supports text)
    inputs = [
        TextBlock(type="text", text="A beautiful sunset"),
        TextBlock(type="text", text="AgentScope framework"),
    ]

    response = await embedding_model(inputs)

    print(f"Generated {len(response.embeddings)} embeddings")
    print(f"Embedding dimension: {len(response.embeddings[0])}")

asyncio.run(example_multimodal_embedding())
For image and video inputs, use ImageBlock with URLSource (for publicly accessible URLs) or Base64Source (for base64-encoded data). The example above uses text for simplicity.

Provider Reference

AgentScope supports multiple embedding model providers:
ProviderModel ClassSupported ModalitiesKey Features
OpenAIOpenAITextEmbeddingTextHigh-quality text embeddings with configurable dimensions
DashScopeDashScopeTextEmbeddingTextQwen-based text embeddings
DashScopeDashScopeMultiModalEmbeddingText, Image, VideoUnified embeddings for cross-modal retrieval
GeminiGeminiTextEmbeddingTextGoogle Gemini text embeddings
OllamaOllamaTextEmbeddingTextLocal embedding models
Common parameters:
  • api_key: API key for authentication
  • model_name: The embedding model identifier
  • dimensions: Embedding vector dimension (provider-dependent)
  • embedding_cache: Optional cache instance to avoid repeated API calls
Usage tips:
  • Use text embedding models for semantic search, clustering, and classification tasks.
  • Use multimodal embedding models for cross-modal retrieval (e.g., search images by text).
  • Enable caching for frequently embedded content to reduce API costs.
  • Batch multiple inputs in a single call for better efficiency.