AI Provider Testing Guide¶

This guide covers how to manually test the AI dialogue system with different LLM providers. Use this to verify your configuration and ensure NPCs respond correctly.

Supported Providers¶

MAID supports three AI providers out of the box:

Provider	Name	Best For	Requirements
Anthropic Claude	`anthropic`	Production use, quality responses	API key, `anthropic` package
OpenAI GPT	`openai`	Alternative production option	API key, `openai` package
Ollama	`ollama`	Local development, offline use	Ollama server, `httpx` package

Testing with Anthropic (Claude)¶

Anthropic Claude is the default and recommended provider for production use.

Setup¶

Get an API Key
Visit console.anthropic.com
Create an account and generate an API key
Note: API usage incurs costs
Configure the Environment

Create or update your .env file:

# Required
MAID_AI_ANTHROPIC_API_KEY=sk-ant-api03-your-key-here

# Optional: Override default model
MAID_AI_ANTHROPIC_MODEL=claude-sonnet-4-20250514

# Optional: Set as default provider (it already is by default)
MAID_AI_DEFAULT_PROVIDER=anthropic

Install the Package

uv sync --extra anthropic
# or
pip install anthropic

Available Models¶

Model	Description	Use Case
`claude-sonnet-4-20250514`	Latest Sonnet (default)	Production, best balance
`claude-opus-4-20250514`	Opus - highest quality	Complex characters
`claude-3-5-haiku-20241022`	Haiku - fastest	High-traffic, simple NPCs

Test Commands¶

Start the server and test:

# Start the server
uv run maid server start --debug

# In another terminal, connect via telnet
telnet localhost 4000

# After logging in and selecting a character:
> talk bartender Hello! What's the news today?
> ask guard about the curfew
> greet merchant

Programmatic Testing¶

import asyncio
from maid_engine.ai.providers.anthropic import AnthropicProvider
from maid_engine.ai.providers.base import Message, CompletionOptions

async def test_anthropic():
    provider = AnthropicProvider(
        api_key="sk-ant-api03-your-key-here",
        default_model="claude-sonnet-4-20250514",
    )

    messages = [
        Message.system("You are a gruff dwarf blacksmith named Grumbar."),
        Message.user("Hello! Do you have any weapons for sale?"),
    ]

    result = await provider.complete(messages, CompletionOptions(
        max_tokens=150,
        temperature=0.7,
    ))

    print(f"Response: {result.content}")
    print(f"Tokens used: {result.usage}")

asyncio.run(test_anthropic())

Streaming Test¶

async def test_anthropic_streaming():
    provider = AnthropicProvider(api_key="sk-ant-api03-your-key-here")

    messages = [
        Message.system("You are a mysterious merchant."),
        Message.user("What rare items do you have?"),
    ]

    print("Streaming response: ", end="")
    async for chunk in provider.complete_streaming(messages):
        print(chunk, end="", flush=True)
    print()

asyncio.run(test_anthropic_streaming())

Testing with OpenAI (GPT-4)¶

OpenAI's GPT models provide an alternative for those who prefer or require OpenAI.

Setup¶

Get an API Key
Visit platform.openai.com
Create an account and generate an API key
Note: API usage incurs costs
Configure the Environment

# Required
MAID_AI_OPENAI_API_KEY=sk-your-key-here

# Optional: Override default model
MAID_AI_OPENAI_MODEL=gpt-4o

# Required if using as primary provider
MAID_AI_DEFAULT_PROVIDER=openai

Install the Package

uv sync --extra openai
# or
pip install openai

Available Models¶

Model	Description	Use Case
`gpt-4o`	Latest GPT-4o (default)	Production, best quality
`gpt-4o-mini`	Smaller, faster	High-traffic, cost-sensitive
`gpt-4-turbo`	GPT-4 Turbo	Alternative to gpt-4o
`gpt-3.5-turbo`	Fastest, cheapest	Simple NPCs, testing

Test Commands¶

# Ensure OpenAI is configured
export MAID_AI_DEFAULT_PROVIDER=openai
export MAID_AI_OPENAI_API_KEY=sk-your-key-here

# Start the server
uv run maid server start --debug

# Connect and test
telnet localhost 4000
> talk bartender What ales do you have on tap?

Programmatic Testing¶

import asyncio
from maid_engine.ai.providers.openai import OpenAIProvider
from maid_engine.ai.providers.base import Message, CompletionOptions

async def test_openai():
    provider = OpenAIProvider(
        api_key="sk-your-key-here",
        default_model="gpt-4o",
    )

    messages = [
        Message.system("You are a wise old sage in a fantasy library."),
        Message.user("What can you tell me about dragons?"),
    ]

    result = await provider.complete(messages, CompletionOptions(
        max_tokens=200,
        temperature=0.6,
    ))

    print(f"Response: {result.content}")
    print(f"Model used: {result.model}")
    print(f"Tokens: {result.usage}")

asyncio.run(test_openai())

Testing with Ollama (Local)¶

Ollama runs LLMs locally on your machine, ideal for development and offline testing.

Setup¶

Install Ollama

Visit ollama.ai and download for your platform:

# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Windows: Download from website

Start the Ollama Server
```
ollama serve
```

Pull a Model

# Recommended models for NPC dialogue
ollama pull llama3.2       # Good balance of speed/quality
ollama pull llama3.1       # Better quality, slower
ollama pull mistral        # Fast, good for simple NPCs
ollama pull phi3           # Very fast, lightweight

Configure the Environment

# Optional: Ollama host (defaults to localhost:11434)
MAID_AI_OLLAMA_HOST=http://localhost:11434

# Optional: Default model
MAID_AI_OLLAMA_MODEL=llama3.2

# Required if using as primary provider
MAID_AI_DEFAULT_PROVIDER=ollama

Available Models¶

Models must be pulled before use. Common options:

Model	Size	Speed	Quality	Command
`llama3.2`	2GB	Fast	Good	`ollama pull llama3.2`
`llama3.1`	4.7GB	Medium	Better	`ollama pull llama3.1`
`llama3.1:70b`	40GB	Slow	Best	`ollama pull llama3.1:70b`
`mistral`	4.1GB	Fast	Good	`ollama pull mistral`
`mixtral`	26GB	Slow	Excellent	`ollama pull mixtral`
`phi3`	2.2GB	Fastest	Basic	`ollama pull phi3`
`codellama`	3.8GB	Medium	Good	`ollama pull codellama`

Test Commands¶

# Ensure Ollama is running
ollama serve

# In another terminal, configure and start MAID
export MAID_AI_DEFAULT_PROVIDER=ollama
export MAID_AI_OLLAMA_MODEL=llama3.2

uv run maid server start --debug

# Connect and test
telnet localhost 4000
> talk guard What are the town's laws?

Programmatic Testing¶

import asyncio
from maid_engine.ai.providers.ollama import OllamaProvider
from maid_engine.ai.providers.base import Message, CompletionOptions

async def test_ollama():
    provider = OllamaProvider(
        host="http://localhost:11434",
        default_model="llama3.2",
    )

    # Check if server is running
    if not await provider.is_available():
        print("Ollama server not available. Run 'ollama serve' first.")
        return

    # List installed models
    models = await provider.list_local_models()
    print(f"Installed models: {models}")

    # Test completion
    messages = [
        Message.system("You are a friendly tavern keeper."),
        Message.user("What's on the menu today?"),
    ]

    result = await provider.complete(messages, CompletionOptions(
        max_tokens=150,
        temperature=0.7,
    ))

    print(f"Response: {result.content}")

asyncio.run(test_ollama())

Troubleshooting Ollama¶

Server not starting:

# Check if already running
curl http://localhost:11434/api/tags

# Kill existing process and restart
pkill ollama
ollama serve

Model not found:

# List installed models
ollama list

# Pull the model you need
ollama pull llama3.2

Out of memory: - Use a smaller model (phi3, llama3.2) - Close other applications - Consider using a cloud provider instead

CLI Testing Tool¶

MAID includes a CLI command for testing AI providers directly:

# Test default provider
uv run maid dev test-ai "Tell me about yourself"

# Test specific provider
uv run maid dev test-ai "Hello!" --provider anthropic
uv run maid dev test-ai "Hello!" --provider openai
uv run maid dev test-ai "Hello!" --provider ollama

# Test with custom settings
uv run maid dev test-ai "What's the weather?" --temperature 0.9 --max-tokens 200

Verification Checklist¶

Use this checklist to verify your AI dialogue setup is working correctly:

Provider Configuration¶

[ ] API key is set (for Anthropic/OpenAI) or server is running (Ollama)
[ ] Provider package is installed (anthropic, openai, or httpx)
[ ] Default provider is configured in environment
[ ] Test prompt returns a response

Basic Dialogue¶

[ ] talk <npc> <message> returns AI response
[ ] ask <npc> about <topic> returns relevant response
[ ] greet <npc> initiates conversation
[ ] Response streams correctly (appears word by word)

Conversation History¶

[ ] NPC remembers context from previous messages
[ ] conversations command lists active conversations
[ ] endconversation <npc> properly ends conversation
[ ] Conversation times out after configured period

Rate Limiting¶

[ ] Rapid requests show "wait" message
[ ] NPC cooldown prevents spam
[ ] Token budget limits are enforced
[ ] Rate limit messages are player-friendly

Content Filtering¶

[ ] NPCs stay in character
[ ] NPCs don't acknowledge being AI
[ ] Inappropriate requests are deflected
[ ] wont_discuss topics are avoided

Fallback Behavior¶

[ ] Setting ai_enabled=False uses fallback responses
[ ] API errors fall back to fallback_response
[ ] Disabling global AI uses fallback responses
[ ] Non-existent NPCs show appropriate error

Context Injection¶

[ ] World context affects responses (time of day, weather)
[ ] Location context affects responses (room description)
[ ] Player context affects responses (name, level)
[ ] Disabling context options reduces prompt size

Performance Testing¶

Response Time Benchmarks¶

Test response times for different configurations:

import asyncio
import time
from maid_engine.ai.providers.anthropic import AnthropicProvider
from maid_engine.ai.providers.base import Message, CompletionOptions

async def benchmark_provider(provider, model, iterations=5):
    messages = [
        Message.system("You are a tavern keeper. Respond briefly."),
        Message.user("What drinks do you have?"),
    ]
    options = CompletionOptions(max_tokens=100, temperature=0.7)

    times = []
    for i in range(iterations):
        start = time.time()
        await provider.complete(messages, options)
        elapsed = time.time() - start
        times.append(elapsed)
        print(f"  Iteration {i+1}: {elapsed:.2f}s")

    avg = sum(times) / len(times)
    print(f"  Average: {avg:.2f}s")
    return avg

async def main():
    provider = AnthropicProvider(api_key="sk-ant-...")

    print("Testing claude-3-5-haiku-20241022:")
    await benchmark_provider(provider, "claude-3-5-haiku-20241022")

    print("\nTesting claude-sonnet-4-20250514:")
    await benchmark_provider(provider, "claude-sonnet-4-20250514")

asyncio.run(main())

Expected Response Times¶

Provider	Model	Typical Response Time
Anthropic	claude-3-5-haiku	0.5-1.5s
Anthropic	claude-sonnet-4	1-3s
Anthropic	claude-opus-4	2-5s
OpenAI	gpt-4o-mini	0.5-1.5s
OpenAI	gpt-4o	1-3s
Ollama	llama3.2 (local)	0.5-2s
Ollama	mistral (local)	0.3-1s

Note: Times vary based on network latency, server load, and prompt length.

Switching Providers at Runtime¶

You can configure different providers for different NPCs:

# Use Anthropic for important NPCs
important_npc = DialogueComponent(
    provider_name="anthropic",
    model_name="claude-sonnet-4-20250514",
    # ... other settings
)

# Use local Ollama for background NPCs
background_npc = DialogueComponent(
    provider_name="ollama",
    model_name="phi3",
    # ... other settings
)

# Use OpenAI for variety
special_npc = DialogueComponent(
    provider_name="openai",
    model_name="gpt-4o",
    # ... other settings
)

This allows you to: - Use expensive models for key characters - Use fast/cheap models for background NPCs - Test different providers in production - Fall back to local models if API is down

Security Considerations¶

API Key Protection¶

Never commit API keys to version control
Use environment variables or .env files
Add .env to .gitignore
Rotate keys if they may have been exposed

Rate Limiting¶

Configure rate limits to prevent abuse and cost overruns:

# Conservative limits for production
MAID_AI_DIALOGUE_GLOBAL_RATE_LIMIT_RPM=60
MAID_AI_DIALOGUE_PER_PLAYER_RATE_LIMIT_RPM=10
MAID_AI_DIALOGUE_DAILY_TOKEN_BUDGET=50000
MAID_AI_DIALOGUE_PER_PLAYER_DAILY_BUDGET=2000

Logging¶

Be careful with conversation logging due to privacy:

# Only enable in development
MAID_AI_DIALOGUE_LOG_CONVERSATIONS=false