Skip to content

AI Players Architecture

This guide explains how AI Players think, remember, and act. Understanding the architecture helps with debugging, tuning, and extending AI Player behavior.

Three-Layer Cognitive Architecture

AI Players use a hybrid control architecture inspired by Brooks' subsumption architecture (1986), with three layers running at different speeds and costs:

┌─────────────────────────────────────────────────────────┐
│  Layer 3: Deliberative                                  │
│  Independent asyncio.Task · 15-min cadence · expensive  │
│  Strategic planning, reflection, goal revision           │
│                                                         │
│  Writes goals/plans to shared state ──────────┐         │
└─────────────────────────────────────────────────┼────────┘
                                                  │ reads
┌─────────────────────────────────────────────────┼────────┐
│  Layer 2: Executive                             ▼        │
│  1–3 s cadence · cheap LLM / rules                      │
│  Perceive → Memory → Plan check → Act                   │
│                                                         │
│  Output suppressed when Layer 1 fires ──┐               │
└──────────────────────────────────────────┼───────────────┘
                                           │ inhibits
┌──────────────────────────────────────────┼───────────────┐
│  Layer 1: Reactive                       ▼               │
│  Every game tick · <10ms · zero LLM                     │
│  FSM reflexes: survival, combat, social                 │
└─────────────────────────────────────────────────────────┘

The key insight: the fast loop never waits for the slow loop. Layer 3 runs as an independent asyncio.Task and posts updated goals and plans to shared state. Layers 1 and 2 read that state but never block on it.

Layer 1 — Reactive (Every Tick)

The reactive controller runs FSM-based behaviors on every game tick with zero LLM cost. When a reactive trigger fires, it suppresses Layer 2 output for that tick (Brooks-style inhibition).

Priority order (highest first):

  1. Survival — Critical HP (<15%). Flee or heal based on personality.
  2. Combat — Fight-or-flight FSM. Aggression threshold from personality.
  3. Social — Greeting reflex when players arrive (extraverted personalities only).

The combat FSM tracks states: IDLE → ENGAGED → FLEEING → RECOVERING → IDLE. Flee threshold scales with the personality's neuroticism dimension (0.3 for brave, 0.6 for anxious).

Layer 2 — Executive (1–3s Cadence)

The main decision loop, running on a configurable cadence:

Perceive → World Model → Memory Encode → Plan Check → Act
  1. Perceive: Drain session output buffer. Parse via regex patterns and GMCP packet extraction.
  2. World Model: Integrate observations into the structured world model (map, inventory, status, entities).
  3. Memory: Encode significant observations as episodic memories.
  4. Plan Check: If the current plan is invalid or complete, trigger replanning.
  5. Act: Select the next command from the plan and inject it into the virtual session.

After each action, the bug filing pipeline checks for anomalies (broken exits, command errors, state inconsistencies) and may file a structured bug report.

Layer 3 — Deliberative (Async)

Runs on its own schedule (default: every 15 minutes) as an independent asyncio.Task. Uses the expensive LLM tier for:

  • Reflection: Synthesize episodic memories into higher-level insights.
  • Strategic review: Re-evaluate whether current goals still make sense.
  • Goal revision: Generate new goals based on accumulated experience.

Memory System

AI Players maintain five memory layers, modeled after cognitive science research:

Layer Purpose Capacity Example
Working Active context for current tick 2000 tokens "I'm in the tavern, talking to the barkeep"
Episodic Past experiences 500 entries "I fought a goblin in the forest and lost"
Semantic Learned facts 200 entries "The blacksmith sells swords"
Procedural Learned command sequences 100 entries "To buy: go shop, list, buy sword"
Reflective Meta-insights from reflection 50 entries "I should avoid the forest until I'm stronger"

Retrieval Scoring

When the executive loop needs context, memories are scored with:

score = α · recency + β · importance + γ · relevance
  • Recency: Exponential decay (λ^age) — recent memories rank higher.
  • Importance: Rated 1–10 at encoding time — combat and death score high.
  • Relevance: Cosine similarity when embeddings are available, keyword overlap as fallback.

Consolidation and Forgetting

Episodic memories are periodically consolidated into semantic facts (e.g., many visits to a shop become "the shop is at coordinates 5,3"). Stale memories decay exponentially and are eventually forgotten, keeping retrieval sets small and costs low.

Perception Pipeline

Game output flows through a multi-stage perception pipeline:

Raw session output ──→ Text Parser (regex patterns)
                        ├─ Room descriptions
                        ├─ Combat events
                        ├─ Chat messages
                        └─ System messages

GMCP packets ──────────→ Structured data extraction
                        ├─ Char.Status (HP, MP, level)
                        ├─ Room.Info (name, exits, contents)
                        └─ Comm.Channel (chat channels)

Combined ──────────────→ Observation sanitizer
                        └─ Importance scoring

Prompt Injection Defense

Player speech is wrapped in PLAYER_SPEECH boundary markers before being included in any LLM prompt. This prevents players from injecting instructions into an AI Player's cognitive pipeline through in-game chat.

Planning Hierarchy

Plans are structured in four levels:

Level Scope Example
Goal Long-term objective "Reach level 5"
Phase Major milestone "Complete the goblin quest"
Task Concrete step "Navigate to the goblin camp"
Action Single command move north

Replanning triggers when: a task fails, the world model changes significantly, or Layer 3 posts a new strategic plan.

Safety Features

  • Prompt injection defense: PLAYER_SPEECH boundary markers isolate player input in LLM prompts.
  • Action blacklist: Certain commands (e.g., admin commands) are never generated.
  • Rate limiting: Configurable per-agent LLM call limits.
  • Stuck detection: Repeated identical actions trigger automatic replanning.
  • Sensitive action gate: High-impact actions require higher confidence thresholds.

Cost Model

AI Players use a tiered LLM strategy to keep costs around $0.10/agent/hour:

Operation Model Tier Frequency Example
Reactive decisions None (FSM) Every tick Combat, survival
Action selection Cheap 1–3s cadence "What command next?"
Replanning Cheap On failure/completion "What's my new plan?"
Bug classification Cheap On anomaly detection "Is this a real bug?"
Reflection Expensive ~15 min cadence "What have I learned?"
Goal generation Expensive ~15 min cadence "What should I do next?"

When the budget is exhausted, the agent degrades gracefully: Layer 3 stops, Layer 2 falls back to rule-based action selection, and Layer 1 continues unchanged.

For the full technical specification, see the AI Players specification.