Skip to content

AI Provider Testing Guide

This guide covers how to manually test the AI dialogue system with different LLM providers. Use this to verify your configuration and ensure NPCs respond correctly.

Supported Providers

MAID supports three AI providers out of the box:

Provider Name Best For Requirements
Anthropic Claude anthropic Production use, quality responses API key, anthropic package
OpenAI GPT openai Alternative production option API key, openai package
Ollama ollama Local development, offline use Ollama server, httpx package

Testing with Anthropic (Claude)

Anthropic Claude is the default and recommended provider for production use.

Setup

  1. Get an API Key
  2. Visit console.anthropic.com
  3. Create an account and generate an API key
  4. Note: API usage incurs costs

  5. Configure the Environment

Create or update your .env file:

# Required
MAID_AI_ANTHROPIC_API_KEY=sk-ant-api03-your-key-here

# Optional: Override default model
MAID_AI_ANTHROPIC_MODEL=claude-sonnet-4-20250514

# Optional: Set as default provider (it already is by default)
MAID_AI_DEFAULT_PROVIDER=anthropic

  1. Install the Package
    uv sync --extra anthropic
    # or
    pip install anthropic
    

Available Models

Model Description Use Case
claude-sonnet-4-20250514 Latest Sonnet (default) Production, best balance
claude-opus-4-20250514 Opus - highest quality Complex characters
claude-3-5-haiku-20241022 Haiku - fastest High-traffic, simple NPCs

Test Commands

Start the server and test:

# Start the server
uv run maid server start --debug

# In another terminal, connect via telnet
telnet localhost 4000

# After logging in and selecting a character:
> talk bartender Hello! What's the news today?
> ask guard about the curfew
> greet merchant

Programmatic Testing

import asyncio
from maid_engine.ai.providers.anthropic import AnthropicProvider
from maid_engine.ai.providers.base import Message, CompletionOptions

async def test_anthropic():
    provider = AnthropicProvider(
        api_key="sk-ant-api03-your-key-here",
        default_model="claude-sonnet-4-20250514",
    )

    messages = [
        Message.system("You are a gruff dwarf blacksmith named Grumbar."),
        Message.user("Hello! Do you have any weapons for sale?"),
    ]

    result = await provider.complete(messages, CompletionOptions(
        max_tokens=150,
        temperature=0.7,
    ))

    print(f"Response: {result.content}")
    print(f"Tokens used: {result.usage}")

asyncio.run(test_anthropic())

Streaming Test

async def test_anthropic_streaming():
    provider = AnthropicProvider(api_key="sk-ant-api03-your-key-here")

    messages = [
        Message.system("You are a mysterious merchant."),
        Message.user("What rare items do you have?"),
    ]

    print("Streaming response: ", end="")
    async for chunk in provider.complete_streaming(messages):
        print(chunk, end="", flush=True)
    print()

asyncio.run(test_anthropic_streaming())

Testing with OpenAI (GPT-4)

OpenAI's GPT models provide an alternative for those who prefer or require OpenAI.

Setup

  1. Get an API Key
  2. Visit platform.openai.com
  3. Create an account and generate an API key
  4. Note: API usage incurs costs

  5. Configure the Environment

# Required
MAID_AI_OPENAI_API_KEY=sk-your-key-here

# Optional: Override default model
MAID_AI_OPENAI_MODEL=gpt-4o

# Required if using as primary provider
MAID_AI_DEFAULT_PROVIDER=openai
  1. Install the Package
    uv sync --extra openai
    # or
    pip install openai
    

Available Models

Model Description Use Case
gpt-4o Latest GPT-4o (default) Production, best quality
gpt-4o-mini Smaller, faster High-traffic, cost-sensitive
gpt-4-turbo GPT-4 Turbo Alternative to gpt-4o
gpt-3.5-turbo Fastest, cheapest Simple NPCs, testing

Test Commands

# Ensure OpenAI is configured
export MAID_AI_DEFAULT_PROVIDER=openai
export MAID_AI_OPENAI_API_KEY=sk-your-key-here

# Start the server
uv run maid server start --debug

# Connect and test
telnet localhost 4000
> talk bartender What ales do you have on tap?

Programmatic Testing

import asyncio
from maid_engine.ai.providers.openai import OpenAIProvider
from maid_engine.ai.providers.base import Message, CompletionOptions

async def test_openai():
    provider = OpenAIProvider(
        api_key="sk-your-key-here",
        default_model="gpt-4o",
    )

    messages = [
        Message.system("You are a wise old sage in a fantasy library."),
        Message.user("What can you tell me about dragons?"),
    ]

    result = await provider.complete(messages, CompletionOptions(
        max_tokens=200,
        temperature=0.6,
    ))

    print(f"Response: {result.content}")
    print(f"Model used: {result.model}")
    print(f"Tokens: {result.usage}")

asyncio.run(test_openai())

Testing with Ollama (Local)

Ollama runs LLMs locally on your machine, ideal for development and offline testing.

Setup

  1. Install Ollama

Visit ollama.ai and download for your platform:

# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Windows: Download from website
  1. Start the Ollama Server

    ollama serve
    

  2. Pull a Model

    # Recommended models for NPC dialogue
    ollama pull llama3.2       # Good balance of speed/quality
    ollama pull llama3.1       # Better quality, slower
    ollama pull mistral        # Fast, good for simple NPCs
    ollama pull phi3           # Very fast, lightweight
    

  3. Configure the Environment

    # Optional: Ollama host (defaults to localhost:11434)
    MAID_AI_OLLAMA_HOST=http://localhost:11434
    
    # Optional: Default model
    MAID_AI_OLLAMA_MODEL=llama3.2
    
    # Required if using as primary provider
    MAID_AI_DEFAULT_PROVIDER=ollama
    

Available Models

Models must be pulled before use. Common options:

Model Size Speed Quality Command
llama3.2 2GB Fast Good ollama pull llama3.2
llama3.1 4.7GB Medium Better ollama pull llama3.1
llama3.1:70b 40GB Slow Best ollama pull llama3.1:70b
mistral 4.1GB Fast Good ollama pull mistral
mixtral 26GB Slow Excellent ollama pull mixtral
phi3 2.2GB Fastest Basic ollama pull phi3
codellama 3.8GB Medium Good ollama pull codellama

Test Commands

# Ensure Ollama is running
ollama serve

# In another terminal, configure and start MAID
export MAID_AI_DEFAULT_PROVIDER=ollama
export MAID_AI_OLLAMA_MODEL=llama3.2

uv run maid server start --debug

# Connect and test
telnet localhost 4000
> talk guard What are the town's laws?

Programmatic Testing

import asyncio
from maid_engine.ai.providers.ollama import OllamaProvider
from maid_engine.ai.providers.base import Message, CompletionOptions

async def test_ollama():
    provider = OllamaProvider(
        host="http://localhost:11434",
        default_model="llama3.2",
    )

    # Check if server is running
    if not await provider.is_available():
        print("Ollama server not available. Run 'ollama serve' first.")
        return

    # List installed models
    models = await provider.list_local_models()
    print(f"Installed models: {models}")

    # Test completion
    messages = [
        Message.system("You are a friendly tavern keeper."),
        Message.user("What's on the menu today?"),
    ]

    result = await provider.complete(messages, CompletionOptions(
        max_tokens=150,
        temperature=0.7,
    ))

    print(f"Response: {result.content}")

asyncio.run(test_ollama())

Troubleshooting Ollama

Server not starting:

# Check if already running
curl http://localhost:11434/api/tags

# Kill existing process and restart
pkill ollama
ollama serve

Model not found:

# List installed models
ollama list

# Pull the model you need
ollama pull llama3.2

Out of memory: - Use a smaller model (phi3, llama3.2) - Close other applications - Consider using a cloud provider instead

CLI Testing Tool

MAID includes a CLI command for testing AI providers directly:

# Test default provider
uv run maid dev test-ai "Tell me about yourself"

# Test specific provider
uv run maid dev test-ai "Hello!" --provider anthropic
uv run maid dev test-ai "Hello!" --provider openai
uv run maid dev test-ai "Hello!" --provider ollama

# Test with custom settings
uv run maid dev test-ai "What's the weather?" --temperature 0.9 --max-tokens 200

Verification Checklist

Use this checklist to verify your AI dialogue setup is working correctly:

Provider Configuration

  • [ ] API key is set (for Anthropic/OpenAI) or server is running (Ollama)
  • [ ] Provider package is installed (anthropic, openai, or httpx)
  • [ ] Default provider is configured in environment
  • [ ] Test prompt returns a response

Basic Dialogue

  • [ ] talk <npc> <message> returns AI response
  • [ ] ask <npc> about <topic> returns relevant response
  • [ ] greet <npc> initiates conversation
  • [ ] Response streams correctly (appears word by word)

Conversation History

  • [ ] NPC remembers context from previous messages
  • [ ] conversations command lists active conversations
  • [ ] endconversation <npc> properly ends conversation
  • [ ] Conversation times out after configured period

Rate Limiting

  • [ ] Rapid requests show "wait" message
  • [ ] NPC cooldown prevents spam
  • [ ] Token budget limits are enforced
  • [ ] Rate limit messages are player-friendly

Content Filtering

  • [ ] NPCs stay in character
  • [ ] NPCs don't acknowledge being AI
  • [ ] Inappropriate requests are deflected
  • [ ] wont_discuss topics are avoided

Fallback Behavior

  • [ ] Setting ai_enabled=False uses fallback responses
  • [ ] API errors fall back to fallback_response
  • [ ] Disabling global AI uses fallback responses
  • [ ] Non-existent NPCs show appropriate error

Context Injection

  • [ ] World context affects responses (time of day, weather)
  • [ ] Location context affects responses (room description)
  • [ ] Player context affects responses (name, level)
  • [ ] Disabling context options reduces prompt size

Performance Testing

Response Time Benchmarks

Test response times for different configurations:

import asyncio
import time
from maid_engine.ai.providers.anthropic import AnthropicProvider
from maid_engine.ai.providers.base import Message, CompletionOptions

async def benchmark_provider(provider, model, iterations=5):
    messages = [
        Message.system("You are a tavern keeper. Respond briefly."),
        Message.user("What drinks do you have?"),
    ]
    options = CompletionOptions(max_tokens=100, temperature=0.7)

    times = []
    for i in range(iterations):
        start = time.time()
        await provider.complete(messages, options)
        elapsed = time.time() - start
        times.append(elapsed)
        print(f"  Iteration {i+1}: {elapsed:.2f}s")

    avg = sum(times) / len(times)
    print(f"  Average: {avg:.2f}s")
    return avg

async def main():
    provider = AnthropicProvider(api_key="sk-ant-...")

    print("Testing claude-3-5-haiku-20241022:")
    await benchmark_provider(provider, "claude-3-5-haiku-20241022")

    print("\nTesting claude-sonnet-4-20250514:")
    await benchmark_provider(provider, "claude-sonnet-4-20250514")

asyncio.run(main())

Expected Response Times

Provider Model Typical Response Time
Anthropic claude-3-5-haiku 0.5-1.5s
Anthropic claude-sonnet-4 1-3s
Anthropic claude-opus-4 2-5s
OpenAI gpt-4o-mini 0.5-1.5s
OpenAI gpt-4o 1-3s
Ollama llama3.2 (local) 0.5-2s
Ollama mistral (local) 0.3-1s

Note: Times vary based on network latency, server load, and prompt length.

Switching Providers at Runtime

You can configure different providers for different NPCs:

# Use Anthropic for important NPCs
important_npc = DialogueComponent(
    provider_name="anthropic",
    model_name="claude-sonnet-4-20250514",
    # ... other settings
)

# Use local Ollama for background NPCs
background_npc = DialogueComponent(
    provider_name="ollama",
    model_name="phi3",
    # ... other settings
)

# Use OpenAI for variety
special_npc = DialogueComponent(
    provider_name="openai",
    model_name="gpt-4o",
    # ... other settings
)

This allows you to: - Use expensive models for key characters - Use fast/cheap models for background NPCs - Test different providers in production - Fall back to local models if API is down

Security Considerations

API Key Protection

  • Never commit API keys to version control
  • Use environment variables or .env files
  • Add .env to .gitignore
  • Rotate keys if they may have been exposed

Rate Limiting

Configure rate limits to prevent abuse and cost overruns:

# Conservative limits for production
MAID_AI_DIALOGUE_GLOBAL_RATE_LIMIT_RPM=60
MAID_AI_DIALOGUE_PER_PLAYER_RATE_LIMIT_RPM=10
MAID_AI_DIALOGUE_DAILY_TOKEN_BUDGET=50000
MAID_AI_DIALOGUE_PER_PLAYER_DAILY_BUDGET=2000

Logging

Be careful with conversation logging due to privacy:

# Only enable in development
MAID_AI_DIALOGUE_LOG_CONVERSATIONS=false