TelTest — E2E Testing Framework for MUDs¶
Version: 1.1 (post-adversarial review) Status: Implemented Author(s): MAID Core Team Date: 2026-02-18 Priority: P1 — Critical for regression safety as game content grows
Table of Contents¶
- Executive Summary
- Problem Statement & Current State
- Design Goals & Non-Goals
- Architecture Overview
- Core Components
- 1. MUDClient — Async Telnet Driver
- 2. Expectation Engine
- 3. Test Fixtures & Server Lifecycle
- 4. Script DSL
- 5. Session Recording & Playback
- API Design
- MUDClient API
- Fixture API
- Script DSL API
- Telnet Protocol Handling
- State Isolation & Determinism
- Test Patterns & Examples
- Error Handling & Diagnostics
- Configuration
- CI/CD Integration
- Package Structure
- Migration & Adoption
- Future Work
- Appendix A: Telnet Protocol Reference
- Appendix B: MAID Login Flow Reference
- Appendix C: Adversarial Review Summary
Executive Summary¶
TelTest is a pytest-native E2E testing framework for MUD engines. It provides an async telnet client that connects to a running server and drives it through text-based send/expect interactions — the same way a real player would. Think of it as "Playwright for MUDs."
Key capabilities:
- MUDClient — An async Python telnet client with pattern-matching
expect()that handles IAC negotiation, ANSI stripping, and prompt detection - Pytest fixtures — Managed server lifecycle (start engine → run tests → teardown) with automatic port allocation to avoid conflicts
- Script DSL — Declarative YAML-based test scripts for non-programmers to author E2E scenarios
- Session recording — Capture real play sessions and convert them into reproducible test cases
Business impact: Enables regression testing of the full player experience — login, character creation, combat, quests — across every commit. Catches integration bugs that unit tests miss (protocol negotiation, command routing, state transitions across ticks).
Problem Statement & Current State¶
What Exists Today¶
| Layer | Test Coverage | Notes |
|---|---|---|
| ECS core | ✅ Extensive | Unit tests for World, Entity, Component, System |
| Command handlers | ✅ Good | MockSession-based tests for individual commands |
| Content packs | ✅ Good | Unit tests for pack protocol compliance |
| Network protocol | ⚠️ Minimal | Mocked asyncio.start_server, no real connections |
| Login/auth flow | ❌ None | CharacterHandler/LoginHandler untested end-to-end |
| Full player journey | ❌ None | No test connects, logs in, plays, and disconnects |
| Cross-system interactions | ❌ None | Combat→inventory→quest chains untested |
The Problem¶
- No full-stack tests. A bug where
character.raceis a string instead of an enum passes all unit tests but crashes every new player on login (we just fixed this exact bug). - No protocol-level tests. Telnet negotiation, GMCP dispatch, MCCP compression, and prompt detection are tested only via mocks.
- Manual QA is the only safety net. Every release requires someone to telnet in and manually walk through character creation.
- Content pack interactions are invisible. When stdlib changes break classic-rpg, nothing catches it until runtime.
Success Criteria¶
| ID | Criterion | Metric |
|---|---|---|
| SC-1 | Full login→play→quit flow tested | ≥1 test per connection path (telnet, websocket) |
| SC-2 | Character creation tested | All races/classes create successfully |
| SC-3 | Core gameplay loop tested | look, move, get, drop, inventory, say |
| SC-4 | Tests run in CI | < 60s total, no port conflicts, no flakiness |
| SC-5 | Non-programmers can author tests | YAML script DSL with clear documentation |
| SC-6 | Portable to other MUDs | Core MUDClient has zero MAID-specific dependencies |
Design Goals & Non-Goals¶
Goals¶
- Pytest-native — Tests are normal async pytest functions using fixtures
- Real network I/O — Tests connect over TCP, exercising the full stack
- Fast — Server starts once per test session, tests reuse the connection where possible
- Deterministic — Automatic port allocation, seeded randomness, tick synchronization
- Portable —
MUDClientis a standalone async telnet client usable with any MUD - Debuggable — Rich failure messages showing expected vs. received output with context
Non-Goals¶
- GUI testing — Web frontend testing is out of scope (Playwright covers that)
- Load/stress testing — Performance benchmarks are a separate concern
- AI response testing — LLM outputs are non-deterministic; test the plumbing, not the prose
- Cross-network testing — Tests run against localhost only
Architecture Overview¶
┌─────────────────────────────────────────────────────────────┐
│ pytest session │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────────┐ │
│ │ test_login.py│ │test_combat.py│ │ test_scripts.py │ │
│ │ │ │ │ │ (YAML runner) │ │
│ └──────┬───────┘ └──────┬───────┘ └────────┬──────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ TelTest Fixtures Layer │ │
│ │ • mud_server (session-scoped, starts GameEngine) │ │
│ │ • mud_client (function-scoped, connects & logs in) │ │
│ │ • raw_client (function-scoped, bare connection) │ │
│ └──────────────────────┬──────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ MUDClient │ │
│ │ • connect() / disconnect() │ │
│ │ • send(text) / expect(pattern) │ │
│ │ • expect_prompt() / expect_sequence() │ │
│ │ • ANSI stripping / IAC negotiation │ │
│ │ • Output buffer with history │ │
│ └──────────────────────┬──────────────────────────────┘ │
│ │ TCP │
└─────────────────────────┼───────────────────────────────────┘
│
▼
┌───────────────────────┐
│ MAID Server │
│ (GameEngine + Telnet │
│ on ephemeral port) │
└───────────────────────┘
Data Flow¶
- pytest discovers test functions and invokes session-scoped
mud_serverfixture mud_serverstarts aGameEnginewith in-memory storage on a random free port- Each test gets a
mud_client— aMUDClientinstance connected over TCP - Test calls
send()/expect()to drive the interaction MUDClienthandles telnet IAC negotiation transparently- On fixture teardown, client disconnects and server stops
Core Components¶
1. MUDClient — Async Telnet Driver¶
The heart of the framework. A pure-async telnet client with pattern-matching output expectations. Zero MAID-specific dependencies — it speaks raw telnet.
Responsibilities:
- TCP connection management with configurable timeouts
- Telnet IAC negotiation (respond to WILL/DO/WONT/DONT)
- Optional GMCP support (send/receive structured data)
- ANSI escape code stripping for clean text matching
- Output buffering with rolling history
- Pattern-based
expect()with timeout and failure diagnostics - Prompt detection (text not terminated by
\r\n)
Key design decisions:
- Async-native — Built on
asyncio.open_connection, no threads - Non-blocking reads — Continuous background reader task fills an output buffer;
expect()scans from a read cursor with anasyncio.Eventfor new data - Cursor-based matching — Each
expect()andsend()advances a read cursor so assertions only match text received since the last interaction - IAC handled transparently — Tests never see protocol bytes
- ANSI stripping is opt-in —
MUDClient(strip_ansi=True)(default True for tests) - History preserved — Full session transcript available for debugging (ring buffer, configurable max size, default 10,000 lines)
- Reader task contract — Created in
connect(), cancelled indisconnect(). Reader exceptions are captured and re-raised in the nextexpect()call.disconnect()sets a sentinel that unblocks any waitingexpect(). Areader_errorproperty exposes the last exception for diagnostics. - EOF/crash handling — Socket close is detected immediately. If the server drops
the connection,
expect()raisesConnectionClosed(not a timeout). Fixtures can catch this to fail fast or skip remaining tests.
2. Expectation Engine¶
The expect() method is the primary assertion mechanism. It maintains a read
cursor that advances through the output buffer, ensuring each expect() only
matches text received after the previous expect() or send() call. This
prevents stale output from prior commands from satisfying new assertions.
Design decision (post-review): All three adversarial reviewers identified "full buffer scan" as a critical race condition. The cursor-based approach eliminates an entire class of flaky tests where old output matches new patterns.
It supports:
| Mode | Example | Description |
|---|---|---|
| Substring | expect("Welcome") |
Waits for text containing "Welcome" |
| Regex | expect(re.compile(r"Level \d+")) |
Waits for regex match |
| Exact line | expect_line("Your choice: ") |
Matches a complete line exactly |
| Prompt | expect_prompt("> ") |
Waits for a prompt (see Prompt Detection below) |
| Sequence | expect_sequence(["Name:", "Race:", "Class:"]) |
Ordered multi-pattern |
| Absence | expect_not("Error", until="> ") |
Asserts text does NOT appear before boundary |
| Any of | expect_any(["Yes", "No"]) |
Returns whichever matches first |
| Full buffer | expect("pattern", from_start=True) |
Ignores cursor; searches all output |
Cursor behavior:
send()advances the cursor to the current buffer endexpect()searches from cursor forward; on match, advances cursor past the matchclear_buffer()resets both the buffer and the cursorfrom_start=Truebypasses the cursor for full-history assertions
Timeout behavior:
- Default timeout: 5 seconds (configurable per-call and globally)
- On timeout: raises
ExpectTimeoutwith full buffer dump showing what WAS received - Timeout auto-scales: CI environments get 2x multiplier via
TELTEST_TIMEOUT_MULTIPLIER
expect_sequence() timeout semantics:
timeoutis the total time for all patterns combined (default)per_step_timeoutoverrides with a per-pattern limit if provided- This is documented explicitly to avoid ambiguity
Prompt detection:
Prompts are text that arrives without a trailing \r\n. Because TCP packet
fragmentation can cause false positives (a partial line looks like a prompt),
expect_prompt() uses a multi-signal approach:
- Primary: Telnet GA (Go Ahead) signal, if the server sends it after SGA negotiation
- Secondary: Configurable prompt regex (e.g.,
r"^>|^\w+> ") — matches only known prompt patterns, not arbitrary partial lines - Fallback: Quiescence detection — if no new data arrives for a configurable
prompt_settle_time(default 0.3s) and the buffer ends without\r\n, treat as prompt
Design decision (post-review): Gemini identified that relying solely on missing
\r\nis a race condition with packet fragmentation. The multi-signal approach provides defense in depth.
3. Test Fixtures & Server Lifecycle¶
Session-scoped server fixture — The engine starts once per pytest session (or per-module if configured), avoiding the ~2s startup cost per test.
Function-scoped client fixture — Each test gets a fresh TCP connection. The fixture handles connect, optional login, and guaranteed disconnect on teardown.
State isolation — Between tests, the world state is snapshot/restored to prevent state bleed. See State Isolation & Determinism.
Fixture hierarchy:
mud_server (session) ─── starts GameEngine + telnet on ephemeral port
│ waits for readiness probe before yielding
├── mud_client (function) ─── connects, logs in via AuthDriver, selects character
├── raw_client (function) ─── connects only (for testing login flow itself)
└── registered_account (session) ─── creates a test account; validates on each use
Port allocation:
- Uses
socket.bind(('', 0))to get a free ephemeral port - Port passed to
GameEnginesettings before startup - Eliminates port conflicts in parallel CI
Server readiness:
The mud_server fixture waits for the telnet listener to accept connections before
yielding, using a retry loop with exponential backoff:
async def _wait_for_ready(host: str, port: int, timeout: float = 15.0) -> None:
deadline = time.monotonic() + timeout
while time.monotonic() < deadline:
try:
r, w = await asyncio.open_connection(host, port)
w.close()
await w.wait_closed()
return
except (ConnectionRefusedError, OSError):
await asyncio.sleep(0.1)
raise TimeoutError(f"Server not ready on {host}:{port} after {timeout}s")
Server liveness:
If the engine crashes mid-session, subsequent tests skip gracefully:
@pytest.fixture
async def mud_client(mud_server):
if not mud_server.is_alive():
pytest.skip("Server crashed in a previous test")
...
AuthDriver abstraction:
The login flow is abstracted behind a pluggable AuthDriver protocol, keeping
the fixture layer portable. MAID provides MaidAuthDriver; other MUDs implement
their own:
class AuthDriver(Protocol):
"""Pluggable login flow for different MUD engines."""
async def register(
self, client: MUDClient, username: str, password: str, email: str
) -> None:
"""Register a new account via the MUD's registration flow."""
async def login(
self, client: MUDClient, username: str, password: str
) -> None:
"""Log in to an existing account."""
async def select_character(
self, client: MUDClient, character_name: str
) -> None:
"""Select or create a character and enter the game world."""
Design decision (post-review): All three reviewers identified the MAID-coupled fixtures as undermining portability claims. The
AuthDriverprotocol makes the fixture layer genuinely reusable across MUD engines.
4. Script DSL¶
A YAML-based format for declarative test scenarios. Aimed at game designers and QA who may not write Python.
# tests/e2e/scripts/test_character_creation.yaml
name: "Character creation - Human Warrior"
description: "Verify a new player can create a Human Warrior character"
tags: [smoke, character-creation]
setup:
account:
username: "testplayer"
password: "testpass123"
register: true
steps:
- send: "1"
expect: "Select your race"
- send: "1"
expect: "Select your class"
- send: "1"
expect: "Select your gender"
- send: "3"
expect: "Character Summary"
expect_all:
- "Human"
- "Warrior"
- "Neutral"
- send: "Y"
expect: "Welcome to the world"
- send: "look"
expect_any:
- "You see"
- "exits:"
- send: "quit"
expect: "Goodbye"
Script runner — A pytest plugin collects .yaml files from a configured
directory and generates parametrized test cases. Each script becomes a test item
in pytest output:
tests/e2e/test_scripts.py::test_character_creation_human_warrior PASSED
tests/e2e/test_scripts.py::test_basic_movement PASSED
5. Session Recording & Playback¶
Record mode — Wraps a MUDClient to capture all send/receive pairs with
timestamps. Saves to a .teltest JSON file:
{
"recorded_at": "2026-02-18T06:00:00Z",
"server": "localhost:4000",
"events": [
{"t": 0.0, "type": "recv", "text": "Welcome to MAID\r\n"},
{"t": 0.1, "type": "recv", "text": "Your choice: "},
{"t": 1.2, "type": "send", "text": "R"},
{"t": 1.4, "type": "recv", "text": "Choose a username: "},
...
]
}
Playback mode — Converts a recording into a YAML script or a Python test, replacing literal text with fuzzy patterns where appropriate.
CLI tool:
# Record a session
uv run teltest record localhost 4000 -o session.teltest
# Convert recording to YAML test
uv run teltest convert session.teltest -o test_flow.yaml
# Run scripts directly
uv run teltest run tests/e2e/scripts/ --server localhost:4000
API Design¶
MUDClient API¶
class MUDClient:
"""Async telnet client for MUD E2E testing.
This client is MUD-engine agnostic. It speaks raw telnet and
provides pattern-matching expectations on text output.
Example:
async with MUDClient("localhost", 4000) as client:
await client.expect("Welcome")
await client.send("connect user pass")
await client.expect("Entering World")
"""
def __init__(
self,
host: str = "localhost",
port: int = 4000,
*,
timeout: float = 5.0,
strip_ansi: bool = True,
encoding: str = "utf-8",
negotiate_gmcp: bool = False,
) -> None: ...
# --- Connection lifecycle ---
async def connect(self) -> None:
"""Open TCP connection and complete telnet negotiation."""
async def disconnect(self) -> None:
"""Gracefully close the connection."""
async def __aenter__(self) -> "MUDClient": ...
async def __aexit__(self, *exc: object) -> None: ...
@property
def is_connected(self) -> bool: ...
# --- Sending ---
async def send(self, text: str) -> None:
"""Send a line of text (appends \\r\\n)."""
async def send_raw(self, data: bytes) -> None:
"""Send raw bytes without modification."""
# --- Expecting ---
async def expect(
self,
pattern: str | re.Pattern[str],
*,
timeout: float | None = None,
from_start: bool = False,
) -> Match:
"""Wait for output matching pattern. Returns Match with context.
Searches output received since the last send()/expect() call (cursor-based).
Set from_start=True to search the entire buffer history.
Args:
pattern: Substring or compiled regex to match against output.
timeout: Override default timeout for this call.
from_start: If True, search from buffer start ignoring cursor.
Returns:
Match object containing matched text, full line, and buffer context.
Raises:
ExpectTimeout: If pattern not found within timeout.
ConnectionClosed: If the server closed the connection.
"""
async def expect_prompt(
self,
prompt: str | re.Pattern[str] = "> ",
*,
timeout: float | None = None,
) -> str:
"""Wait for a prompt using multi-signal detection.
Detection priority:
1. Telnet GA (Go Ahead) signal
2. Match against prompt pattern
3. Quiescence (no new data for prompt_settle_time)
"""
async def expect_line(
self,
text: str,
*,
timeout: float | None = None,
) -> str:
"""Wait for an exact complete line."""
async def expect_sequence(
self,
patterns: list[str | re.Pattern[str]],
*,
timeout: float | None = None,
per_step_timeout: float | None = None,
) -> list[Match]:
"""Wait for multiple patterns in order.
Args:
timeout: Total time allowed for all patterns (default).
per_step_timeout: If set, each pattern gets this much time instead.
"""
async def expect_any(
self,
patterns: list[str | re.Pattern[str]],
*,
timeout: float | None = None,
) -> tuple[int, Match]:
"""Wait for any of the patterns. Returns (index, match)."""
async def expect_not(
self,
pattern: str | re.Pattern[str],
*,
until: str | re.Pattern[str],
timeout: float | None = None,
) -> None:
"""Assert that pattern does NOT appear before the 'until' boundary.
This avoids unconditional sleeps by waiting for a positive signal
(e.g., the next prompt) and asserting the negative pattern didn't
appear before it.
Args:
pattern: The text/regex that must NOT appear.
until: A positive boundary pattern to wait for.
timeout: Max time to wait for the 'until' boundary.
Raises:
UnexpectedMatch: If pattern appears before until.
ExpectTimeout: If until is not found within timeout.
"""
# --- Buffer access ---
@property
def output(self) -> str:
"""All received text since connection (ring buffer, max 10k lines)."""
@property
def recent(self) -> str:
"""Text received since last send() or expect() call (cursor-based)."""
def clear_buffer(self) -> None:
"""Clear the output buffer and reset the read cursor."""
@property
def transcript(self) -> list[TranscriptEntry]:
"""Full session transcript with timestamps (ring buffer)."""
# --- Connection health ---
@property
def reader_error(self) -> Exception | None:
"""Last exception from the background reader task, if any."""
# --- GMCP (optional) ---
async def send_gmcp(self, package: str, data: dict[str, Any]) -> None:
"""Send a GMCP message."""
async def expect_gmcp(
self,
package: str,
*,
timeout: float | None = None,
) -> dict[str, Any]:
"""Wait for a GMCP message on the given package."""
@property
def gmcp_messages(self) -> list[tuple[str, dict[str, Any]]]:
"""All received GMCP messages."""
@dataclass(frozen=True)
class Match:
"""Result of a successful expect() call."""
text: str # The matched text
line: str # The full line containing the match
pattern: str # The pattern that matched
elapsed: float # Seconds waited
buffer_before: str # Buffer context before the match (last N lines)
@dataclass(frozen=True)
class TranscriptEntry:
"""Single entry in the session transcript."""
timestamp: float
direction: Literal["send", "recv", "gmcp_send", "gmcp_recv"]
text: str
class ExpectTimeout(AssertionError):
"""Raised when expect() times out.
Attributes:
pattern: What we were looking for.
timeout: How long we waited.
buffer: What was actually received.
"""
pattern: str
timeout: float
buffer: str
class ConnectionClosed(Exception):
"""Raised when the server closes the connection during expect().
Attributes:
buffer: Text received before the connection closed.
"""
buffer: str
class UnexpectedMatch(AssertionError):
"""Raised by expect_not() when the forbidden pattern appears.
Attributes:
pattern: The pattern that should not have appeared.
match: The Match object showing where it appeared.
"""
pattern: str
match: Match
Fixture API¶
@pytest.fixture(scope="session")
async def mud_server() -> AsyncGenerator[MUDServer, None]:
"""Start a MAID server for E2E testing.
Starts GameEngine with in-memory storage on an ephemeral port.
Waits for readiness probe before yielding.
The server runs for the entire pytest session.
Yields:
MUDServer with .host, .port, .engine, .is_alive() attributes.
"""
@pytest.fixture
async def raw_client(mud_server: MUDServer) -> AsyncGenerator[MUDClient, None]:
"""A connected MUDClient with no login.
Use this for testing the login/registration flow itself.
Skips if server is not alive (crashed in a previous test).
"""
@pytest.fixture
async def mud_client(
mud_server: MUDServer,
registered_account: AccountInfo,
) -> AsyncGenerator[MUDClient, None]:
"""A MUDClient that is logged in and in the game world.
Uses the configured AuthDriver to handle the login flow.
Ready to send game commands immediately.
Skips if server is not alive.
"""
@pytest.fixture(scope="session")
async def registered_account(mud_server: MUDServer) -> AccountInfo:
"""An account registered on the test server.
Created once per session via the AuthDriver registration flow.
Validates account still exists on each use.
"""
@dataclass
class MUDServer:
"""Handle to the running test server."""
host: str
port: int
engine: GameEngine
def is_alive(self) -> bool:
"""Check if the engine is still running."""
async def wait_ticks(self, n: int = 1) -> None:
"""Block until N game ticks have been processed."""
async def snapshot_world(self) -> WorldSnapshot:
"""Capture current world state for later restoration."""
async def restore_world(self, snapshot: WorldSnapshot) -> None:
"""Restore world state from a snapshot."""
@dataclass
class AccountInfo:
"""Credentials for a test account."""
username: str
password: str
email: str
Script DSL API¶
# Full schema for a TelTest script
name: string # Required. Test name (becomes pytest node ID)
description: string # Optional. Shown in verbose output
tags: [string] # Optional. Maps to pytest markers
timeout: float # Optional. Default timeout for all steps (default: 5.0)
setup: # Optional. Pre-test configuration
account: # Account to use
username: string
password: string
register: bool # If true, register the account first
character: # Character to create/select
name: string
race: string # e.g. "human", "elf"
class: string # e.g. "warrior", "mage"
select: int # Or select existing by index
steps: # Required. Test actions
- send: string # Send a command
expect: string # Wait for substring (shorthand)
expect_re: string # Wait for regex
expect_all: [string] # Wait for ALL substrings (any order)
expect_any: [string] # Wait for ANY substring
expect_not: # Assert text does NOT appear before boundary
pattern: string
until: string # Required positive boundary
expect_prompt: string # Wait for specific prompt
timeout: float # Per-step timeout override
delay: float # Wait N seconds before this step
- branch: string # Send text and branch on response
cases:
"pattern A": # If output contains this...
- send: "..." # ...run these steps
"pattern B":
- send: "..."
- note: string # Comment in the transcript (no action)
- group: string # Named step group for reporting
steps: [...] # Nested steps
teardown: # Optional. Cleanup actions
- send: "quit"
Telnet Protocol Handling¶
The MUDClient must handle telnet at the byte level. The design isolates protocol
handling into a TelnetProtocol layer.
TCP bytes in
│
▼
┌──────────────┐
│ TelnetProtocol│──── IAC negotiation (auto-respond)
│ │──── MCCP2 decompression
│ │──── GMCP extraction
└──────┬───────┘
│ clean text
▼
┌──────────────┐
│ ANSIStripper │──── Remove \x1b[...m sequences
└──────┬───────┘
│ plain text
▼
┌──────────────┐
│ OutputBuffer │──── Line splitting, prompt detection
└──────────────┘
IAC negotiation strategy:
| Server sends | Client responds | Notes |
|---|---|---|
WILL GMCP (201) |
DO GMCP (if gmcp enabled) or DONT GMCP |
GMCP opt-in |
WILL ECHO (1) |
DO ECHO |
Password hiding |
WILL SGA (3) |
DO SGA |
Standard |
DO TTYPE (24) |
WILL TTYPE, then send "TELTEST" |
Identifies as TelTest client |
DO NAWS (31) |
WILL NAWS, then send 80×24 |
Standard terminal size |
DO MSDP (69) |
WONT MSDP |
Not needed for testing |
DO MXP (91) |
WONT MXP |
Not needed for testing |
WILL MCCP2 (86) |
DONT MCCP2 |
Compression adds complexity; decline by default |
| Anything else | WONT or DONT |
Safe default: refuse unknown options |
Prompt detection:
Prompts are text that arrives without a trailing \r\n. The buffer tracks whether
the last chunk ended mid-line. When expect_prompt() is called, it matches against
the current unterminated line in the buffer.
State Isolation & Determinism¶
This section addresses the #1 concern from adversarial review. All three reviewers (Opus, Codex, Gemini) identified shared mutable state as the most critical risk to test reliability.
The Problem¶
The mud_server fixture is session-scoped for performance (~2s startup cost).
But this means all tests share one GameEngine and one World. Without isolation:
- Test A creates character "Thorn" → Test B sees "Thorn" in the room
- Test C drops a sword → Test D picks it up
- Test ordering becomes load-bearing → flaky CI
Solution: World Snapshot/Restore¶
The mud_server fixture exposes snapshot_world() / restore_world() methods.
A function-scoped autouse fixture captures state before each test and restores
it after:
@pytest.fixture(autouse=True)
async def _isolate_world(mud_server: MUDServer) -> AsyncGenerator[None, None]:
"""Snapshot and restore world state around each test."""
snapshot = await mud_server.snapshot_world()
yield
await mud_server.restore_world(snapshot)
What gets snapshot/restored:
- Entity registry (entities, components, tags)
- Room index and spatial data
- In-memory document store collections
- Command registry state
What is NOT restored (intentionally):
- Account registry (session-scoped accounts persist)
- Engine configuration
- Network listener state
Tick Synchronization¶
The game engine runs a tick loop that processes systems asynchronously. Tests that depend on system processing (combat, NPC AI, item respawn) need to synchronize with the tick loop.
# Wait for the combat system to process the attack
await mud_client.send("attack goblin")
await mud_server.wait_ticks(2) # Wait for 2 ticks to process
await mud_client.expect("You hit")
API:
class MUDServer:
async def wait_ticks(self, n: int = 1) -> None:
"""Block until N ticks have been processed.
Uses an asyncio.Event that the engine tick loop signals after each tick.
"""
async def pause_ticks(self) -> None:
"""Pause the tick loop for deterministic step-through testing."""
async def step_tick(self) -> None:
"""Process exactly one tick while paused."""
async def resume_ticks(self) -> None:
"""Resume the tick loop after pausing."""
Design decision: Tick sync is exposed on
MUDServer(notMUDClient) because it requires engine access. The client is intentionally ignorant of the server's internals to maintain the portability boundary.
Unique Test Data¶
To further reduce state coupling, fixtures generate unique data per test:
@pytest.fixture
def unique_name() -> str:
"""Generate a unique character/account name per test."""
return f"test_{uuid4().hex[:8]}"
This prevents name collisions even when snapshot/restore is not perfect.
Test Patterns & Examples¶
Pattern 1: Full Login Flow¶
@pytest.mark.e2e
async def test_register_login_create_character(raw_client: MUDClient) -> None:
"""Test complete new-player experience."""
# Registration
await raw_client.expect("Please select an option")
await raw_client.send("R")
await raw_client.expect("Choose a username")
await raw_client.send("testplayer")
await raw_client.expect("Email address")
await raw_client.send("test@example.com")
await raw_client.expect("Choose a password")
await raw_client.send("secret123")
await raw_client.expect("Confirm password")
await raw_client.send("secret123")
await raw_client.expect("Account created")
# Character creation
await raw_client.expect("Character Selection")
await raw_client.send("C")
await raw_client.expect("Enter character name")
await raw_client.send("Thorn")
await raw_client.expect("Select your race")
await raw_client.send("1") # Human
await raw_client.expect("Select your class")
await raw_client.send("1") # Warrior
await raw_client.expect("Select your gender")
await raw_client.send("3") # Neutral
await raw_client.expect("Character Summary")
await raw_client.send("Y")
await raw_client.expect("Welcome to the world, Thorn!")
# In world
await raw_client.expect("Entering World")
await raw_client.send("quit")
await raw_client.expect("Goodbye")
Pattern 2: Gameplay Commands¶
@pytest.mark.e2e
async def test_look_and_move(mud_client: MUDClient) -> None:
"""Test basic navigation. mud_client is already logged in."""
await mud_client.send("look")
match = await mud_client.expect(re.compile(r"exits?:", re.IGNORECASE))
assert match.text # Room has exits
await mud_client.send("inventory")
await mud_client.expect_any(["You are carrying", "Your inventory is empty"])
Pattern 3: Multi-Client Interaction¶
@pytest.mark.e2e
async def test_player_sees_other_player(mud_server: MUDServer) -> None:
"""Two players in the same room can see each other."""
async with logged_in_client(mud_server, "alice") as alice, \
logged_in_client(mud_server, "bob") as bob:
# Both start in the same room
await alice.send("look")
await alice.expect("Bob")
await bob.send("say Hello!")
await alice.expect("Bob says")
Pattern 4: YAML Script¶
name: "Basic navigation smoke test"
tags: [smoke, navigation]
setup:
account:
username: "navtest"
password: "testpass123"
register: true
character:
name: "Navigator"
race: "human"
class: "warrior"
steps:
- send: "look"
expect: "exits"
- send: "inventory"
expect_any: ["carrying", "empty"]
- send: "quit"
expect: "Goodbye"
Error Handling & Diagnostics¶
ExpectTimeout Failure Output¶
When expect() times out, the error message provides full context:
teltest.ExpectTimeout: Pattern not found within 5.0s
Expected: "Welcome to the world"
Received (last 20 lines):
─────────────────────────
| Select your gender:
|
| [1] Male
| [2] Female
| [3] Neutral
|
| [C]ancel - Return to selection
|
| Select gender (1-3): █ ← (prompt, waiting for input)
─────────────────────────
Transcript (last 5 interactions):
[0.00s] RECV: "Character Summary"
[0.01s] RECV: "Create this character?"
[0.50s] SEND: "Y"
[0.52s] RECV: "Character created successfully."
[5.00s] TIMEOUT waiting for "Welcome to the world"
Hint: The server may be waiting for additional input. Check if
a prompt was expected before this step.
Connection Failure¶
teltest.ConnectionFailed: Could not connect to localhost:4000
Attempted: 3 retries over 6.0s
Last error: ConnectionRefusedError: [Errno 111] Connection refused
Hint: Is the server running? Check the mud_server fixture log.
Diagnostics Features¶
--teltest-verbose— Print full transcript to stdout on failure--teltest-record— Save.teltestrecording for every test--teltest-timeout-multiplier=N— Scale all timeouts (useful for slow CI)- Transcript on every assertion failure — via pytest plugin hook
Configuration¶
Environment Variables¶
| Variable | Default | Description |
|---|---|---|
TELTEST_HOST |
localhost |
Server host for standalone runs |
TELTEST_PORT |
4000 |
Server port for standalone runs |
TELTEST_TIMEOUT |
5.0 |
Default expect timeout (seconds) |
TELTEST_TIMEOUT_MULTIPLIER |
1.0 |
Multiply all timeouts (for CI) |
TELTEST_STRIP_ANSI |
true |
Strip ANSI escape codes |
TELTEST_SCRIPT_DIR |
tests/e2e/scripts |
YAML script search directory |
TELTEST_RECORDING_DIR |
tests/e2e/recordings |
Recording output directory |
pytest.ini / pyproject.toml¶
[tool.pytest.ini_options]
markers = [
"e2e: End-to-end tests requiring a running server",
]
[tool.teltest]
timeout = 5.0
timeout_multiplier = 1.0
strip_ansi = true
script_dirs = ["tests/e2e/scripts"]
server_startup_timeout = 15.0
CI/CD Integration¶
Added post-review. Opus and Gemini flagged the missing CI section as a gap given success criterion SC-4.
GitHub Actions Workflow¶
name: E2E Tests
on: [push, pull_request]
jobs:
e2e:
runs-on: ubuntu-latest
timeout-minutes: 10
env:
MAID_DEBUG: "true"
TELTEST_TIMEOUT_MULTIPLIER: "2.0"
steps:
- uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v4
- name: Install dependencies
run: uv sync
- name: Run E2E tests
run: uv run pytest tests/e2e/ -m e2e -x --timeout=120
- name: Upload transcripts on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: teltest-transcripts
path: tests/e2e/recordings/
Key CI Considerations¶
- Timeout multiplier: Set
TELTEST_TIMEOUT_MULTIPLIER=2.0in CI — shared runners have variable CPU availability -x(fail fast): Stop on first failure since server crash makes all subsequent tests pointless- Separate from unit tests: Run
pytest -m e2eseparately frompytest -m "not e2e"so unit test failures don't block E2E and vice versa - Port isolation: Ephemeral ports via
socket.bind(('', 0))— no hardcoded ports - Zombie prevention: The
mud_serverfixture registers anatexithandler to terminate the engine. The GitHub Actionstimeout-minutesprovides a hard cap. - Docker networking:
localhostworks in CI containers. No special networking needed since server and tests run in the same process. - pytest-xdist support: For parallel execution, each xdist worker gets its own
mud_serverfixture (session-scoped per worker). Port allocation is already per-instance via ephemeral binding.
Running E2E Locally¶
# Run all E2E tests
uv run pytest tests/e2e/ -m e2e
# Run just YAML script tests
uv run pytest tests/e2e/test_scripts.py
# Run with verbose transcript output
uv run pytest tests/e2e/ -m e2e --teltest-verbose
# Validate YAML scripts without running
uv run teltest validate tests/e2e/scripts/
Package Structure¶
packages/teltest/
├── pyproject.toml # Standalone package, no MAID deps
├── src/teltest/
│ ├── __init__.py # Public API exports
│ ├── client.py # MUDClient class
│ ├── protocol.py # TelnetProtocol (IAC handling)
│ ├── ansi.py # ANSI escape code stripper
│ ├── buffer.py # OutputBuffer with line/prompt tracking
│ ├── expect.py # Expectation engine (Match, ExpectTimeout)
│ ├── gmcp.py # GMCP send/receive helpers
│ ├── recorder.py # Session recording
│ ├── script/
│ │ ├── __init__.py
│ │ ├── schema.py # YAML script schema (Pydantic models)
│ │ ├── runner.py # Script execution engine
│ │ └── converter.py # Recording → script converter
│ ├── cli.py # CLI entry point (record, convert, run)
│ └── pytest_plugin.py # Fixtures, markers, CLI options
│
├── tests/ # TelTest's own tests
│ ├── test_client.py
│ ├── test_protocol.py
│ ├── test_ansi.py
│ ├── test_buffer.py
│ ├── test_expect.py
│ ├── test_script_runner.py
│ └── conftest.py
│
└── README.md # Standalone package docs
# MAID-specific E2E tests live in the main repo:
tests/e2e/
├── conftest.py # mud_server, mud_client, registered_account fixtures
├── test_login.py # Login/registration flow tests
├── test_character.py # Character creation/selection tests
├── test_gameplay.py # Core gameplay command tests
├── test_multiplayer.py # Multi-client interaction tests
└── scripts/ # YAML test scripts
├── smoke_login.yaml
├── smoke_navigation.yaml
└── character_creation_matrix.yaml
Key decision: TelTest is a standalone package with zero MAID dependencies. It can be published independently and used with any MUD that speaks telnet. The MAID-specific fixtures and tests live in the main repo's
tests/e2e/directory.
Migration & Adoption¶
Phase 1: Core Client & Fixtures¶
- Implement
MUDClient,TelnetProtocol,OutputBuffer,ANSIStripper - Implement
mud_server,raw_client,mud_clientfixtures - Write first E2E test: full login→create→look→quit flow
- Add
pytest -m e2emarker to CI pipeline
Phase 2: Script DSL & Recording¶
- Implement YAML script schema and runner
- Implement session recorder and converter
- Record reference sessions for smoke tests
- Onboard QA team with script authoring docs
Phase 3: Advanced Features¶
- GMCP assertion support
- Multi-client test helpers
- Parametrized character creation matrix (all race/class combos)
- CI integration with timeout multiplier and parallel execution
Future Work¶
- WebSocket client — A companion
WSMUDClientfor testing the/ws/gamepath with JSON message assertions - Fuzz testing — Random input generation to find crash-inducing sequences
- Visual diff — Side-by-side comparison of expected vs. actual session transcripts
- MUD compatibility matrix — Test TelTest against other MUD engines (Evennia, Ranvier, etc.) to validate portability
- pytest-xdist support — Parallel E2E test execution with isolated server instances
- AI-assisted test generation — Use LLM to convert natural-language scenarios ("test that a warrior can equip a sword") into test scripts
Appendix A: Telnet Protocol Reference¶
| Byte | Name | Description |
|---|---|---|
| 255 | IAC | Interpret As Command |
| 254 | DONT | Refuse option |
| 253 | DO | Request option |
| 252 | WONT | Will not use option |
| 251 | WILL | Will use option |
| 250 | SB | Subnegotiation Begin |
| 240 | SE | Subnegotiation End |
| 1 | ECHO | Echo |
| 3 | SGA | Suppress Go-Ahead |
| 24 | TTYPE | Terminal Type |
| 31 | NAWS | Window Size |
| 69 | MSDP | MUD Server Data Protocol |
| 70 | MSSP | MUD Server Status Protocol |
| 86 | MCCP2 | Compression v2 |
| 91 | MXP | MUD eXtension Protocol |
| 201 | GMCP | Generic MUD Communication Protocol |
Appendix B: MAID Login Flow Reference¶
Connect to server
│
▼
"Please select an option"
[L]ogin [R]egister [Q]uit
│
├── L ──► "Username: " → "Password: " → Login validation
│ │ │
│ │ ┌─────────┴─────────┐
│ │ Success Fail (3 max)
│ │ │ │
│ │ ▼ ▼
│ │ Character Selection Disconnect
│ │
├── R ──► "Choose a username: " → "Email: " → "Password: " → "Confirm: "
│ │
│ Account Created
│ │
│ Character Selection
│
└── Q ──► Disconnect
Character Selection
│
├── [N] Select existing ──► "Welcome back, {name}!" ──► Enter World
│
└── [C] Create new ──► Name → Race (1-10) → Class (1-10) → Gender (1-3)
→ Confirm (Y/N) → "Welcome to the world!" → Enter World
Enter World
│
▼
"[Entering World...]"
{look output}
"> " (prompt)
│
▼
Main Loop: receive command → execute → send output → prompt
│
└── "quit" / "exit" ──► "Goodbye!" ──► Disconnect
Appendix C: Adversarial Review Summary¶
This design was reviewed by three adversarial agents (Claude Opus, GPT Codex, Gemini Pro) tasked with finding flaws, gaps, and unrealistic assumptions. Their feedback was synthesized and incorporated into the design above. This appendix records the key findings for traceability.
Unanimous Critical Findings (all 3 reviewers)¶
| # | Finding | Resolution |
|---|---|---|
| 1 | Session-scoped server with shared mutable world state causes test bleed | Added State Isolation section with snapshot/restore |
| 2 | expect() scanning full buffer matches stale output from prior commands |
Redesigned to cursor-based matching; from_start opt-in for full scan |
| 3 | Portability claim undermined by MAID-coupled fixtures | Added AuthDriver protocol abstraction in fixtures |
Unanimous Major Findings (2+ reviewers)¶
| # | Finding | Resolution |
|---|---|---|
| 4 | expect_not() is an unconditional sleep with no correctness guarantee |
Redesigned with required until positive boundary parameter |
| 5 | No tick synchronization primitive for deterministic testing | Added wait_ticks(), pause_ticks(), step_tick() to MUDServer |
| 6 | Prompt detection via missing \r\n is a TCP fragmentation race |
Added multi-signal detection (GA, regex, quiescence) |
| 7 | No server readiness check before tests begin | Added readiness probe retry loop in mud_server fixture |
| 8 | Reader task lifecycle gaps (crash, EOF, cancellation) | Specified reader contract: capture + re-raise, sentinel on disconnect |
| 9 | Unbounded transcript buffer | Changed to ring buffer (10k lines default) |
| 10 | Missing CI/CD section | Added CI/CD Integration section |
| 11 | expect_sequence() timeout semantics ambiguous |
Added explicit timeout (total) vs per_step_timeout parameters |
| 12 | No server liveness check after crash | Added is_alive() with pytest.skip in fixtures |
Notable Minor Findings¶
- YAML DSL needs
branchsupport for non-happy-path scenarios (added) - YAML script validation should happen before execution (added
teltest validateCLI) logged_in_client()helper used in examples but not defined in API (noted for implementation)- Session recording should optionally capture IAC negotiation bytes (noted for Phase 3)
- Tags-to-markers mapping needs explicit rules (noted for implementation)
Reviewer Kudos (consensus strengths)¶
- Standalone package separation (
packages/teltest/vstests/e2e/) ExpectTimeouterror output design (buffer dump + transcript + hints)- Ephemeral port allocation for CI
- ANSI stripping as default with opt-out
- MCCP2 declined by default (pragmatic)
- Phased migration plan (ship Phase 1 first)