Model

AgentScope provides unified async abstractions for various AI models across different providers:

Chat Models

Core text generation with reasoning, streaming, and tools API support.

TTS Models

Convert text to speech with realtime and non-realtime options.

Realtime Models

Bidirectional WebSocket streaming for low-latency voice agents.

Embedding Models

Generate vector representations for retrieval and similarity search.

Chat Model

Chat models are the core of the agent, enabling it to generate streaming/non-streaming responses, perform reasoning, and call tools.

The streaming mode in AgentScope chat models is accumulative — each yielded response contains all content generated so far, not just the latest delta. This design simplifies consumption since you always have the complete current state without tracking deltas.

API Provider	Class	Description
OpenAI	`OpenAIChatModel`	The OpenAI-compatible chat model, supporting OpenAI, Amazon OpenAI, vLLM, DeepSeek, and any model with an OpenAI-compatible API.
DashScope	`DashScopeChatModel`	The unified DashScope API that supports both chat models and multimodal models (e.g. qwen-vl, qwen3.5-plus).
Anthropic	`AnthropicChatModel`	Anthropic Claude models, supporting both chat and multimodal models (e.g. claude-2, claude-instant-100k).
Gemini	`GeminiChatModel`	Google Gemini models.
Ollama	`OllamaChatModel`	Ollama’s local LLM hosting solution.

To support multi-agent conversations in a chatbot format, AgentScope designs a formatter layer that

converts AgentScope’s Msg objects into the expected input format for each LLM API, and
adopts multi-agent conversation context into the two-role chatbot format by prefixing messages with agent names and wrapping them in <history> tags.

Such formatters are distinguished by the suffix ChatFormatter (e.g., DashScopeChatFormatter) and MultiAgentFormatter (e.g., DashScopeMultiAgentFormatter) — the former is for two-party conversations (user + assistant), while the latter is for multi-agent conversations.

For detailed usage examples and a full provider reference table mapping each model class to its corresponding formatter, see Models.

TTS Model

TTS (Text-to-Speech) models convert text into audio. AgentScope supports both non-realtime and realtime TTS models:

API Provider	Class	Description
DashScope	`DashScopeTTSModel`	Non-realtime TTS
DashScope	`DashScopeRealtimeTTSModel`	Realtime TTS with streaming text input for minimal latency
DashScope CosyVoice	`DashScopeCosyVoiceTTSModel`	Non-realtime TTS with enhanced expressiveness and naturalness via DashScope’s CosyVoice technology
DashScope CosyVoice	`DashScopeCosyVoiceRealtimeTTSModel`	Realtime TTS with CosyVoice technology for the most natural and expressive speech synthesis
OpenAI	`OpenAITTSModel`	OpenAI’s TTS model, supporting high-quality speech synthesis with various voice options.
Gemini	`GeminiTTSModel`	Google Gemini’s TTS model, offering natural and expressive speech synthesis.

Realtime Model

Realtime models provide bidirectional, persistent communication over WebSocket, designed primarily for voice agent scenarios where the user speaks and the model responds with speech in real-time.

API Provider	Class	Description
OpenAI	`OpenAIRealtimeModel`	OpenAI’s realtime model, supporting audio and text input, tool use, and server-side VAD for voice activity detection.
DashScope	`DashScopeRealtimeModel`	DashScope’s realtime model, supporting audio and image input for rich multimodal interactions.
Gemini	`GeminiRealtimeModel`	Google’s Gemini realtime model, supporting audio, text, image input, tool use, and server-side VAD for voice activity detection.

Embedding Model

Embedding models generate vector representations for text, images, and other data types. These embeddings are used for retrieval, similarity search, and as input features for downstream tasks.

API Provider	Class	Description
OpenAI	`OpenAITextEmbedding`	OpenAI’s text embedding API.
DashScope	`DashScopeTextEmbedding`	DashScope’s text embedding model.
	`DashScopeMultiModalEmbedding`	DashScope’s multimodal embedding model, generating unified embeddings for both text and images, enabling cross-modal retrieval and understanding.
Gemini	`GeminiTextEmbedding`	Google Gemini’s text embedding model.
Ollama	`OllamaTextEmbedding`	Ollama’s local embedding model for text data.

Get Started

Tutorial

Out-of-box Agents

Basic Concepts

Building Blocks

Observe & Evaluate

Tune Agent

Deploy & Serve

Others

Chat Models

TTS Models

Realtime Models

Embedding Models

Chat Model

TTS Model

Realtime Model

Embedding Model

Get Started

Tutorial

Out-of-box Agents

Basic Concepts

Building Blocks

Observe & Evaluate

Tune Agent

Deploy & Serve

Others

Chat Models

TTS Models

Realtime Models

Embedding Models

​Chat Model

​TTS Model

​Realtime Model

​Embedding Model

Chat Model

TTS Model

Realtime Model

Embedding Model