AI Players Architecture¶
This guide explains how AI Players think, remember, and act. Understanding the architecture helps with debugging, tuning, and extending AI Player behavior.
Three-Layer Cognitive Architecture¶
AI Players use a hybrid control architecture inspired by Brooks' subsumption architecture (1986), with three layers running at different speeds and costs:
┌─────────────────────────────────────────────────────────┐
│ Layer 3: Deliberative │
│ Independent asyncio.Task · 15-min cadence · expensive │
│ Strategic planning, reflection, goal revision │
│ │
│ Writes goals/plans to shared state ──────────┐ │
└─────────────────────────────────────────────────┼────────┘
│ reads
┌─────────────────────────────────────────────────┼────────┐
│ Layer 2: Executive ▼ │
│ 1–3 s cadence · cheap LLM / rules │
│ Perceive → Memory → Plan check → Act │
│ │
│ Output suppressed when Layer 1 fires ──┐ │
└──────────────────────────────────────────┼───────────────┘
│ inhibits
┌──────────────────────────────────────────┼───────────────┐
│ Layer 1: Reactive ▼ │
│ Every game tick · <10ms · zero LLM │
│ FSM reflexes: survival, combat, social │
└─────────────────────────────────────────────────────────┘
The key insight: the fast loop never waits for the slow loop. Layer 3 runs as an independent asyncio.Task and posts updated goals and plans to shared state. Layers 1 and 2 read that state but never block on it.
Layer 1 — Reactive (Every Tick)¶
The reactive controller runs FSM-based behaviors on every game tick with zero LLM cost. When a reactive trigger fires, it suppresses Layer 2 output for that tick (Brooks-style inhibition).
Priority order (highest first):
- Survival — Critical HP (<15%). Flee or heal based on personality.
- Combat — Fight-or-flight FSM. Aggression threshold from personality.
- Social — Greeting reflex when players arrive (extraverted personalities only).
The combat FSM tracks states: IDLE → ENGAGED → FLEEING → RECOVERING → IDLE. Flee threshold scales with the personality's neuroticism dimension (0.3 for brave, 0.6 for anxious).
Layer 2 — Executive (1–3s Cadence)¶
The main decision loop, running on a configurable cadence:
- Perceive: Drain session output buffer. Parse via regex patterns and GMCP packet extraction.
- World Model: Integrate observations into the structured world model (map, inventory, status, entities).
- Memory: Encode significant observations as episodic memories.
- Plan Check: If the current plan is invalid or complete, trigger replanning.
- Act: Select the next command from the plan and inject it into the virtual session.
After each action, the bug filing pipeline checks for anomalies (broken exits, command errors, state inconsistencies) and may file a structured bug report.
Layer 3 — Deliberative (Async)¶
Runs on its own schedule (default: every 15 minutes) as an independent asyncio.Task. Uses the expensive LLM tier for:
- Reflection: Synthesize episodic memories into higher-level insights.
- Strategic review: Re-evaluate whether current goals still make sense.
- Goal revision: Generate new goals based on accumulated experience.
Memory System¶
AI Players maintain five memory layers, modeled after cognitive science research:
| Layer | Purpose | Capacity | Example |
|---|---|---|---|
| Working | Active context for current tick | 2000 tokens | "I'm in the tavern, talking to the barkeep" |
| Episodic | Past experiences | 500 entries | "I fought a goblin in the forest and lost" |
| Semantic | Learned facts | 200 entries | "The blacksmith sells swords" |
| Procedural | Learned command sequences | 100 entries | "To buy: go shop, list, buy sword" |
| Reflective | Meta-insights from reflection | 50 entries | "I should avoid the forest until I'm stronger" |
Retrieval Scoring¶
When the executive loop needs context, memories are scored with:
- Recency: Exponential decay (
λ^age) — recent memories rank higher. - Importance: Rated 1–10 at encoding time — combat and death score high.
- Relevance: Cosine similarity when embeddings are available, keyword overlap as fallback.
Consolidation and Forgetting¶
Episodic memories are periodically consolidated into semantic facts (e.g., many visits to a shop become "the shop is at coordinates 5,3"). Stale memories decay exponentially and are eventually forgotten, keeping retrieval sets small and costs low.
Perception Pipeline¶
Game output flows through a multi-stage perception pipeline:
Raw session output ──→ Text Parser (regex patterns)
├─ Room descriptions
├─ Combat events
├─ Chat messages
└─ System messages
GMCP packets ──────────→ Structured data extraction
├─ Char.Status (HP, MP, level)
├─ Room.Info (name, exits, contents)
└─ Comm.Channel (chat channels)
Combined ──────────────→ Observation sanitizer
└─ Importance scoring
Prompt Injection Defense
Player speech is wrapped in PLAYER_SPEECH boundary markers before being included in any LLM prompt. This prevents players from injecting instructions into an AI Player's cognitive pipeline through in-game chat.
Planning Hierarchy¶
Plans are structured in four levels:
| Level | Scope | Example |
|---|---|---|
| Goal | Long-term objective | "Reach level 5" |
| Phase | Major milestone | "Complete the goblin quest" |
| Task | Concrete step | "Navigate to the goblin camp" |
| Action | Single command | move north |
Replanning triggers when: a task fails, the world model changes significantly, or Layer 3 posts a new strategic plan.
Safety Features¶
- Prompt injection defense:
PLAYER_SPEECHboundary markers isolate player input in LLM prompts. - Action blacklist: Certain commands (e.g., admin commands) are never generated.
- Rate limiting: Configurable per-agent LLM call limits.
- Stuck detection: Repeated identical actions trigger automatic replanning.
- Sensitive action gate: High-impact actions require higher confidence thresholds.
Cost Model¶
AI Players use a tiered LLM strategy to keep costs around $0.10/agent/hour:
| Operation | Model Tier | Frequency | Example |
|---|---|---|---|
| Reactive decisions | None (FSM) | Every tick | Combat, survival |
| Action selection | Cheap | 1–3s cadence | "What command next?" |
| Replanning | Cheap | On failure/completion | "What's my new plan?" |
| Bug classification | Cheap | On anomaly detection | "Is this a real bug?" |
| Reflection | Expensive | ~15 min cadence | "What have I learned?" |
| Goal generation | Expensive | ~15 min cadence | "What should I do next?" |
When the budget is exhausted, the agent degrades gracefully: Layer 3 stops, Layer 2 falls back to rule-based action selection, and Layer 1 continues unchanged.
For the full technical specification, see the AI Players specification.