AI Provider Testing Guide¶
This guide covers how to manually test the AI dialogue system with different LLM providers. Use this to verify your configuration and ensure NPCs respond correctly.
Supported Providers¶
MAID supports three AI providers out of the box:
| Provider | Name | Best For | Requirements |
|---|---|---|---|
| Anthropic Claude | anthropic |
Production use, quality responses | API key, anthropic package |
| OpenAI GPT | openai |
Alternative production option | API key, openai package |
| Ollama | ollama |
Local development, offline use | Ollama server, httpx package |
Testing with Anthropic (Claude)¶
Anthropic Claude is the default and recommended provider for production use.
Setup¶
- Get an API Key
- Visit console.anthropic.com
- Create an account and generate an API key
-
Note: API usage incurs costs
-
Configure the Environment
Create or update your .env file:
# Required
MAID_AI_ANTHROPIC_API_KEY=sk-ant-api03-your-key-here
# Optional: Override default model
MAID_AI_ANTHROPIC_MODEL=claude-sonnet-4-20250514
# Optional: Set as default provider (it already is by default)
MAID_AI_DEFAULT_PROVIDER=anthropic
- Install the Package
Available Models¶
| Model | Description | Use Case |
|---|---|---|
claude-sonnet-4-20250514 |
Latest Sonnet (default) | Production, best balance |
claude-opus-4-20250514 |
Opus - highest quality | Complex characters |
claude-3-5-haiku-20241022 |
Haiku - fastest | High-traffic, simple NPCs |
Test Commands¶
Start the server and test:
# Start the server
uv run maid server start --debug
# In another terminal, connect via telnet
telnet localhost 4000
# After logging in and selecting a character:
> talk bartender Hello! What's the news today?
> ask guard about the curfew
> greet merchant
Programmatic Testing¶
import asyncio
from maid_engine.ai.providers.anthropic import AnthropicProvider
from maid_engine.ai.providers.base import Message, CompletionOptions
async def test_anthropic():
provider = AnthropicProvider(
api_key="sk-ant-api03-your-key-here",
default_model="claude-sonnet-4-20250514",
)
messages = [
Message.system("You are a gruff dwarf blacksmith named Grumbar."),
Message.user("Hello! Do you have any weapons for sale?"),
]
result = await provider.complete(messages, CompletionOptions(
max_tokens=150,
temperature=0.7,
))
print(f"Response: {result.content}")
print(f"Tokens used: {result.usage}")
asyncio.run(test_anthropic())
Streaming Test¶
async def test_anthropic_streaming():
provider = AnthropicProvider(api_key="sk-ant-api03-your-key-here")
messages = [
Message.system("You are a mysterious merchant."),
Message.user("What rare items do you have?"),
]
print("Streaming response: ", end="")
async for chunk in provider.complete_streaming(messages):
print(chunk, end="", flush=True)
print()
asyncio.run(test_anthropic_streaming())
Testing with OpenAI (GPT-4)¶
OpenAI's GPT models provide an alternative for those who prefer or require OpenAI.
Setup¶
- Get an API Key
- Visit platform.openai.com
- Create an account and generate an API key
-
Note: API usage incurs costs
-
Configure the Environment
# Required
MAID_AI_OPENAI_API_KEY=sk-your-key-here
# Optional: Override default model
MAID_AI_OPENAI_MODEL=gpt-4o
# Required if using as primary provider
MAID_AI_DEFAULT_PROVIDER=openai
- Install the Package
Available Models¶
| Model | Description | Use Case |
|---|---|---|
gpt-4o |
Latest GPT-4o (default) | Production, best quality |
gpt-4o-mini |
Smaller, faster | High-traffic, cost-sensitive |
gpt-4-turbo |
GPT-4 Turbo | Alternative to gpt-4o |
gpt-3.5-turbo |
Fastest, cheapest | Simple NPCs, testing |
Test Commands¶
# Ensure OpenAI is configured
export MAID_AI_DEFAULT_PROVIDER=openai
export MAID_AI_OPENAI_API_KEY=sk-your-key-here
# Start the server
uv run maid server start --debug
# Connect and test
telnet localhost 4000
> talk bartender What ales do you have on tap?
Programmatic Testing¶
import asyncio
from maid_engine.ai.providers.openai import OpenAIProvider
from maid_engine.ai.providers.base import Message, CompletionOptions
async def test_openai():
provider = OpenAIProvider(
api_key="sk-your-key-here",
default_model="gpt-4o",
)
messages = [
Message.system("You are a wise old sage in a fantasy library."),
Message.user("What can you tell me about dragons?"),
]
result = await provider.complete(messages, CompletionOptions(
max_tokens=200,
temperature=0.6,
))
print(f"Response: {result.content}")
print(f"Model used: {result.model}")
print(f"Tokens: {result.usage}")
asyncio.run(test_openai())
Testing with Ollama (Local)¶
Ollama runs LLMs locally on your machine, ideal for development and offline testing.
Setup¶
- Install Ollama
Visit ollama.ai and download for your platform:
-
Start the Ollama Server
-
Pull a Model
-
Configure the Environment
Available Models¶
Models must be pulled before use. Common options:
| Model | Size | Speed | Quality | Command |
|---|---|---|---|---|
llama3.2 |
2GB | Fast | Good | ollama pull llama3.2 |
llama3.1 |
4.7GB | Medium | Better | ollama pull llama3.1 |
llama3.1:70b |
40GB | Slow | Best | ollama pull llama3.1:70b |
mistral |
4.1GB | Fast | Good | ollama pull mistral |
mixtral |
26GB | Slow | Excellent | ollama pull mixtral |
phi3 |
2.2GB | Fastest | Basic | ollama pull phi3 |
codellama |
3.8GB | Medium | Good | ollama pull codellama |
Test Commands¶
# Ensure Ollama is running
ollama serve
# In another terminal, configure and start MAID
export MAID_AI_DEFAULT_PROVIDER=ollama
export MAID_AI_OLLAMA_MODEL=llama3.2
uv run maid server start --debug
# Connect and test
telnet localhost 4000
> talk guard What are the town's laws?
Programmatic Testing¶
import asyncio
from maid_engine.ai.providers.ollama import OllamaProvider
from maid_engine.ai.providers.base import Message, CompletionOptions
async def test_ollama():
provider = OllamaProvider(
host="http://localhost:11434",
default_model="llama3.2",
)
# Check if server is running
if not await provider.is_available():
print("Ollama server not available. Run 'ollama serve' first.")
return
# List installed models
models = await provider.list_local_models()
print(f"Installed models: {models}")
# Test completion
messages = [
Message.system("You are a friendly tavern keeper."),
Message.user("What's on the menu today?"),
]
result = await provider.complete(messages, CompletionOptions(
max_tokens=150,
temperature=0.7,
))
print(f"Response: {result.content}")
asyncio.run(test_ollama())
Troubleshooting Ollama¶
Server not starting:
# Check if already running
curl http://localhost:11434/api/tags
# Kill existing process and restart
pkill ollama
ollama serve
Model not found:
Out of memory: - Use a smaller model (phi3, llama3.2) - Close other applications - Consider using a cloud provider instead
CLI Testing Tool¶
MAID includes a CLI command for testing AI providers directly:
# Test default provider
uv run maid dev test-ai "Tell me about yourself"
# Test specific provider
uv run maid dev test-ai "Hello!" --provider anthropic
uv run maid dev test-ai "Hello!" --provider openai
uv run maid dev test-ai "Hello!" --provider ollama
# Test with custom settings
uv run maid dev test-ai "What's the weather?" --temperature 0.9 --max-tokens 200
Verification Checklist¶
Use this checklist to verify your AI dialogue setup is working correctly:
Provider Configuration¶
- [ ] API key is set (for Anthropic/OpenAI) or server is running (Ollama)
- [ ] Provider package is installed (
anthropic,openai, orhttpx) - [ ] Default provider is configured in environment
- [ ] Test prompt returns a response
Basic Dialogue¶
- [ ]
talk <npc> <message>returns AI response - [ ]
ask <npc> about <topic>returns relevant response - [ ]
greet <npc>initiates conversation - [ ] Response streams correctly (appears word by word)
Conversation History¶
- [ ] NPC remembers context from previous messages
- [ ]
conversationscommand lists active conversations - [ ]
endconversation <npc>properly ends conversation - [ ] Conversation times out after configured period
Rate Limiting¶
- [ ] Rapid requests show "wait" message
- [ ] NPC cooldown prevents spam
- [ ] Token budget limits are enforced
- [ ] Rate limit messages are player-friendly
Content Filtering¶
- [ ] NPCs stay in character
- [ ] NPCs don't acknowledge being AI
- [ ] Inappropriate requests are deflected
- [ ]
wont_discusstopics are avoided
Fallback Behavior¶
- [ ] Setting
ai_enabled=Falseuses fallback responses - [ ] API errors fall back to
fallback_response - [ ] Disabling global AI uses fallback responses
- [ ] Non-existent NPCs show appropriate error
Context Injection¶
- [ ] World context affects responses (time of day, weather)
- [ ] Location context affects responses (room description)
- [ ] Player context affects responses (name, level)
- [ ] Disabling context options reduces prompt size
Performance Testing¶
Response Time Benchmarks¶
Test response times for different configurations:
import asyncio
import time
from maid_engine.ai.providers.anthropic import AnthropicProvider
from maid_engine.ai.providers.base import Message, CompletionOptions
async def benchmark_provider(provider, model, iterations=5):
messages = [
Message.system("You are a tavern keeper. Respond briefly."),
Message.user("What drinks do you have?"),
]
options = CompletionOptions(max_tokens=100, temperature=0.7)
times = []
for i in range(iterations):
start = time.time()
await provider.complete(messages, options)
elapsed = time.time() - start
times.append(elapsed)
print(f" Iteration {i+1}: {elapsed:.2f}s")
avg = sum(times) / len(times)
print(f" Average: {avg:.2f}s")
return avg
async def main():
provider = AnthropicProvider(api_key="sk-ant-...")
print("Testing claude-3-5-haiku-20241022:")
await benchmark_provider(provider, "claude-3-5-haiku-20241022")
print("\nTesting claude-sonnet-4-20250514:")
await benchmark_provider(provider, "claude-sonnet-4-20250514")
asyncio.run(main())
Expected Response Times¶
| Provider | Model | Typical Response Time |
|---|---|---|
| Anthropic | claude-3-5-haiku | 0.5-1.5s |
| Anthropic | claude-sonnet-4 | 1-3s |
| Anthropic | claude-opus-4 | 2-5s |
| OpenAI | gpt-4o-mini | 0.5-1.5s |
| OpenAI | gpt-4o | 1-3s |
| Ollama | llama3.2 (local) | 0.5-2s |
| Ollama | mistral (local) | 0.3-1s |
Note: Times vary based on network latency, server load, and prompt length.
Switching Providers at Runtime¶
You can configure different providers for different NPCs:
# Use Anthropic for important NPCs
important_npc = DialogueComponent(
provider_name="anthropic",
model_name="claude-sonnet-4-20250514",
# ... other settings
)
# Use local Ollama for background NPCs
background_npc = DialogueComponent(
provider_name="ollama",
model_name="phi3",
# ... other settings
)
# Use OpenAI for variety
special_npc = DialogueComponent(
provider_name="openai",
model_name="gpt-4o",
# ... other settings
)
This allows you to: - Use expensive models for key characters - Use fast/cheap models for background NPCs - Test different providers in production - Fall back to local models if API is down
Security Considerations¶
API Key Protection¶
- Never commit API keys to version control
- Use environment variables or
.envfiles - Add
.envto.gitignore - Rotate keys if they may have been exposed
Rate Limiting¶
Configure rate limits to prevent abuse and cost overruns:
# Conservative limits for production
MAID_AI_DIALOGUE_GLOBAL_RATE_LIMIT_RPM=60
MAID_AI_DIALOGUE_PER_PLAYER_RATE_LIMIT_RPM=10
MAID_AI_DIALOGUE_DAILY_TOKEN_BUDGET=50000
MAID_AI_DIALOGUE_PER_PLAYER_DAILY_BUDGET=2000
Logging¶
Be careful with conversation logging due to privacy: