AI Players Architecture¶

This guide explains how AI Players think, remember, and act. Understanding the architecture helps with debugging, tuning, and extending AI Player behavior.

Three-Layer Cognitive Architecture¶

AI Players use a hybrid control architecture inspired by Brooks' subsumption architecture (1986), with three layers running at different speeds and costs:

┌─────────────────────────────────────────────────────────┐
│  Layer 3: Deliberative                                  │
│  Independent asyncio.Task · 15-min cadence · expensive  │
│  Strategic planning, reflection, goal revision           │
│                                                         │
│  Writes goals/plans to shared state ──────────┐         │
└─────────────────────────────────────────────────┼────────┘
                                                  │ reads
┌─────────────────────────────────────────────────┼────────┐
│  Layer 2: Executive                             ▼        │
│  1–3 s cadence · cheap LLM / rules                      │
│  Perceive → Memory → Plan check → Act                   │
│                                                         │
│  Output suppressed when Layer 1 fires ──┐               │
└──────────────────────────────────────────┼───────────────┘
                                           │ inhibits
┌──────────────────────────────────────────┼───────────────┐
│  Layer 1: Reactive                       ▼               │
│  Every game tick · <10ms · zero LLM                     │
│  FSM reflexes: survival, combat, social                 │
└─────────────────────────────────────────────────────────┘

The key insight: the fast loop never waits for the slow loop. Layer 3 runs as an independent asyncio.Task and posts updated goals and plans to shared state. Layers 1 and 2 read that state but never block on it.

Layer 1 — Reactive (Every Tick)¶

The reactive controller runs FSM-based behaviors on every game tick with zero LLM cost. When a reactive trigger fires, it suppresses Layer 2 output for that tick (Brooks-style inhibition).

Priority order (highest first):

Survival — Critical HP (<15%). Flee or heal based on personality.
Combat — Fight-or-flight FSM. Aggression threshold from personality.
Social — Greeting reflex when players arrive (extraverted personalities only).

The combat FSM tracks states: IDLE → ENGAGED → FLEEING → RECOVERING → IDLE. Flee threshold scales with the personality's neuroticism dimension (0.3 for brave, 0.6 for anxious).

Layer 2 — Executive (1–3s Cadence)¶

The main decision loop, running on a configurable cadence:

Perceive → World Model → Memory Encode → Plan Check → Act

Perceive: Drain session output buffer. Parse via regex patterns and GMCP packet extraction.
World Model: Integrate observations into the structured world model (map, inventory, status, entities).
Memory: Encode significant observations as episodic memories.
Plan Check: If the current plan is invalid or complete, trigger replanning.
Act: Select the next command from the plan and inject it into the virtual session.

After each action, the bug filing pipeline checks for anomalies (broken exits, command errors, state inconsistencies) and may file a structured bug report.

Layer 3 — Deliberative (Async)¶

Runs on its own schedule (default: every 15 minutes) as an independent asyncio.Task. Uses the expensive LLM tier for:

Reflection: Synthesize episodic memories into higher-level insights.
Strategic review: Re-evaluate whether current goals still make sense.
Goal revision: Generate new goals based on accumulated experience.

Memory System¶

AI Players maintain five memory layers, modeled after cognitive science research:

Layer	Purpose	Capacity	Example
Working	Active context for current tick	2000 tokens	"I'm in the tavern, talking to the barkeep"
Episodic	Past experiences	500 entries	"I fought a goblin in the forest and lost"
Semantic	Learned facts	200 entries	"The blacksmith sells swords"
Procedural	Learned command sequences	100 entries	"To buy: `go shop`, `list`, `buy sword`"
Reflective	Meta-insights from reflection	50 entries	"I should avoid the forest until I'm stronger"

Retrieval Scoring¶

When the executive loop needs context, memories are scored with:

score = α · recency + β · importance + γ · relevance

Recency: Exponential decay (λ^age) — recent memories rank higher.
Importance: Rated 1–10 at encoding time — combat and death score high.
Relevance: Cosine similarity when embeddings are available, keyword overlap as fallback.

Consolidation and Forgetting¶

Episodic memories are periodically consolidated into semantic facts (e.g., many visits to a shop become "the shop is at coordinates 5,3"). Stale memories decay exponentially and are eventually forgotten, keeping retrieval sets small and costs low.

Perception Pipeline¶

Game output flows through a multi-stage perception pipeline:

Raw session output ──→ Text Parser (regex patterns)
                        ├─ Room descriptions
                        ├─ Combat events
                        ├─ Chat messages
                        └─ System messages

GMCP packets ──────────→ Structured data extraction
                        ├─ Char.Status (HP, MP, level)
                        ├─ Room.Info (name, exits, contents)
                        └─ Comm.Channel (chat channels)

Combined ──────────────→ Observation sanitizer
                        └─ Importance scoring

Prompt Injection Defense

Player speech is wrapped in PLAYER_SPEECH boundary markers before being included in any LLM prompt. This prevents players from injecting instructions into an AI Player's cognitive pipeline through in-game chat.

Planning Hierarchy¶

Plans are structured in four levels:

Level	Scope	Example
Goal	Long-term objective	"Reach level 5"
Phase	Major milestone	"Complete the goblin quest"
Task	Concrete step	"Navigate to the goblin camp"
Action	Single command	`move north`

Replanning triggers when: a task fails, the world model changes significantly, or Layer 3 posts a new strategic plan.

Safety Features¶

Prompt injection defense: PLAYER_SPEECH boundary markers isolate player input in LLM prompts.
Action blacklist: Certain commands (e.g., admin commands) are never generated.
Rate limiting: Configurable per-agent LLM call limits.
Stuck detection: Repeated identical actions trigger automatic replanning.
Sensitive action gate: High-impact actions require higher confidence thresholds.

Cost Model¶

AI Players use a tiered LLM strategy to keep costs around $0.10/agent/hour:

Operation	Model Tier	Frequency	Example
Reactive decisions	None (FSM)	Every tick	Combat, survival
Action selection	Cheap	1–3s cadence	"What command next?"
Replanning	Cheap	On failure/completion	"What's my new plan?"
Bug classification	Cheap	On anomaly detection	"Is this a real bug?"
Reflection	Expensive	~15 min cadence	"What have I learned?"
Goal generation	Expensive	~15 min cadence	"What should I do next?"

When the budget is exhausted, the agent degrades gracefully: Layer 3 stops, Layer 2 falls back to rule-based action selection, and Layer 1 continues unchanged.

For the full technical specification, see the AI Players specification.