Chat Models
Core text generation with reasoning, streaming, and tools API support.
TTS Models
Convert text to speech with realtime and non-realtime options.
Realtime Models
Bidirectional WebSocket streaming for low-latency voice agents.
Embedding Models
Generate vector representations for retrieval and similarity search.
Chat Model
Chat models are the core of the agent, enabling it to generate streaming/non-streaming responses, perform reasoning, and call tools.The streaming mode in AgentScope chat models is accumulative — each yielded response contains all content generated so far, not just the latest delta. This design simplifies consumption since you always have the complete current state without tracking deltas.
| API Provider | Class | Description |
|---|---|---|
| OpenAI | OpenAIChatModel | The OpenAI-compatible chat model, supporting OpenAI, Amazon OpenAI, vLLM, DeepSeek, and any model with an OpenAI-compatible API. |
| DashScope | DashScopeChatModel | The unified DashScope API that supports both chat models and multimodal models (e.g. qwen-vl, qwen3.5-plus). |
| Anthropic | AnthropicChatModel | Anthropic Claude models, supporting both chat and multimodal models (e.g. claude-2, claude-instant-100k). |
| Gemini | GeminiChatModel | Google Gemini models. |
| Ollama | OllamaChatModel | Ollama’s local LLM hosting solution. |
- converts AgentScope’s
Msgobjects into the expected input format for each LLM API, and - adopts multi-agent conversation context into the two-role chatbot format by prefixing messages with agent names and wrapping them in
<history>tags.
ChatFormatter (e.g., DashScopeChatFormatter) and MultiAgentFormatter (e.g., DashScopeMultiAgentFormatter) — the former is for two-party conversations (user + assistant), while the latter is for multi-agent conversations.
TTS Model
TTS (Text-to-Speech) models convert text into audio. AgentScope supports both non-realtime and realtime TTS models:| API Provider | Class | Description |
|---|---|---|
| DashScope | DashScopeTTSModel | Non-realtime TTS |
| DashScope | DashScopeRealtimeTTSModel | Realtime TTS with streaming text input for minimal latency |
| DashScope CosyVoice | DashScopeCosyVoiceTTSModel | Non-realtime TTS with enhanced expressiveness and naturalness via DashScope’s CosyVoice technology |
| DashScope CosyVoice | DashScopeCosyVoiceRealtimeTTSModel | Realtime TTS with CosyVoice technology for the most natural and expressive speech synthesis |
| OpenAI | OpenAITTSModel | OpenAI’s TTS model, supporting high-quality speech synthesis with various voice options. |
| Gemini | GeminiTTSModel | Google Gemini’s TTS model, offering natural and expressive speech synthesis. |
Realtime Model
Realtime models provide bidirectional, persistent communication over WebSocket, designed primarily for voice agent scenarios where the user speaks and the model responds with speech in real-time.| API Provider | Class | Description |
|---|---|---|
| OpenAI | OpenAIRealtimeModel | OpenAI’s realtime model, supporting audio and text input, tool use, and server-side VAD for voice activity detection. |
| DashScope | DashScopeRealtimeModel | DashScope’s realtime model, supporting audio and image input for rich multimodal interactions. |
| Gemini | GeminiRealtimeModel | Google’s Gemini realtime model, supporting audio, text, image input, tool use, and server-side VAD for voice activity detection. |
Embedding Model
Embedding models generate vector representations for text, images, and other data types. These embeddings are used for retrieval, similarity search, and as input features for downstream tasks.| API Provider | Class | Description |
|---|---|---|
| OpenAI | OpenAITextEmbedding | OpenAI’s text embedding API. |
| DashScope | DashScopeTextEmbedding | DashScope’s text embedding model. |
DashScopeMultiModalEmbedding | DashScope’s multimodal embedding model, generating unified embeddings for both text and images, enabling cross-modal retrieval and understanding. | |
| Gemini | GeminiTextEmbedding | Google Gemini’s text embedding model. |
| Ollama | OllamaTextEmbedding | Ollama’s local embedding model for text data. |