Skip to content

TelTest — E2E Testing Framework for MUDs

Version: 1.1 (post-adversarial review) Status: Implemented Author(s): MAID Core Team Date: 2026-02-18 Priority: P1 — Critical for regression safety as game content grows


Table of Contents


Executive Summary

TelTest is a pytest-native E2E testing framework for MUD engines. It provides an async telnet client that connects to a running server and drives it through text-based send/expect interactions — the same way a real player would. Think of it as "Playwright for MUDs."

Key capabilities:

  1. MUDClient — An async Python telnet client with pattern-matching expect() that handles IAC negotiation, ANSI stripping, and prompt detection
  2. Pytest fixtures — Managed server lifecycle (start engine → run tests → teardown) with automatic port allocation to avoid conflicts
  3. Script DSL — Declarative YAML-based test scripts for non-programmers to author E2E scenarios
  4. Session recording — Capture real play sessions and convert them into reproducible test cases

Business impact: Enables regression testing of the full player experience — login, character creation, combat, quests — across every commit. Catches integration bugs that unit tests miss (protocol negotiation, command routing, state transitions across ticks).


Problem Statement & Current State

What Exists Today

Layer Test Coverage Notes
ECS core ✅ Extensive Unit tests for World, Entity, Component, System
Command handlers ✅ Good MockSession-based tests for individual commands
Content packs ✅ Good Unit tests for pack protocol compliance
Network protocol ⚠️ Minimal Mocked asyncio.start_server, no real connections
Login/auth flow ❌ None CharacterHandler/LoginHandler untested end-to-end
Full player journey ❌ None No test connects, logs in, plays, and disconnects
Cross-system interactions ❌ None Combat→inventory→quest chains untested

The Problem

  • No full-stack tests. A bug where character.race is a string instead of an enum passes all unit tests but crashes every new player on login (we just fixed this exact bug).
  • No protocol-level tests. Telnet negotiation, GMCP dispatch, MCCP compression, and prompt detection are tested only via mocks.
  • Manual QA is the only safety net. Every release requires someone to telnet in and manually walk through character creation.
  • Content pack interactions are invisible. When stdlib changes break classic-rpg, nothing catches it until runtime.

Success Criteria

ID Criterion Metric
SC-1 Full login→play→quit flow tested ≥1 test per connection path (telnet, websocket)
SC-2 Character creation tested All races/classes create successfully
SC-3 Core gameplay loop tested look, move, get, drop, inventory, say
SC-4 Tests run in CI < 60s total, no port conflicts, no flakiness
SC-5 Non-programmers can author tests YAML script DSL with clear documentation
SC-6 Portable to other MUDs Core MUDClient has zero MAID-specific dependencies

Design Goals & Non-Goals

Goals

  • Pytest-native — Tests are normal async pytest functions using fixtures
  • Real network I/O — Tests connect over TCP, exercising the full stack
  • Fast — Server starts once per test session, tests reuse the connection where possible
  • Deterministic — Automatic port allocation, seeded randomness, tick synchronization
  • PortableMUDClient is a standalone async telnet client usable with any MUD
  • Debuggable — Rich failure messages showing expected vs. received output with context

Non-Goals

  • GUI testing — Web frontend testing is out of scope (Playwright covers that)
  • Load/stress testing — Performance benchmarks are a separate concern
  • AI response testing — LLM outputs are non-deterministic; test the plumbing, not the prose
  • Cross-network testing — Tests run against localhost only

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                        pytest session                        │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────────────┐  │
│  │  test_login.py│  │test_combat.py│  │ test_scripts.py   │  │
│  │              │  │              │  │  (YAML runner)    │  │
│  └──────┬───────┘  └──────┬───────┘  └────────┬──────────┘  │
│         │                 │                    │             │
│         ▼                 ▼                    ▼             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              TelTest Fixtures Layer                  │    │
│  │  • mud_server (session-scoped, starts GameEngine)   │    │
│  │  • mud_client (function-scoped, connects & logs in) │    │
│  │  • raw_client (function-scoped, bare connection)    │    │
│  └──────────────────────┬──────────────────────────────┘    │
│                         │                                    │
│                         ▼                                    │
│  ┌─────────────────────────────────────────────────────┐    │
│  │                   MUDClient                          │    │
│  │  • connect() / disconnect()                         │    │
│  │  • send(text) / expect(pattern)                     │    │
│  │  • expect_prompt() / expect_sequence()              │    │
│  │  • ANSI stripping / IAC negotiation                 │    │
│  │  • Output buffer with history                       │    │
│  └──────────────────────┬──────────────────────────────┘    │
│                         │ TCP                                │
└─────────────────────────┼───────────────────────────────────┘
              ┌───────────────────────┐
              │     MAID Server       │
              │  (GameEngine + Telnet │
              │   on ephemeral port)  │
              └───────────────────────┘

Data Flow

  1. pytest discovers test functions and invokes session-scoped mud_server fixture
  2. mud_server starts a GameEngine with in-memory storage on a random free port
  3. Each test gets a mud_client — a MUDClient instance connected over TCP
  4. Test calls send() / expect() to drive the interaction
  5. MUDClient handles telnet IAC negotiation transparently
  6. On fixture teardown, client disconnects and server stops

Core Components

1. MUDClient — Async Telnet Driver

The heart of the framework. A pure-async telnet client with pattern-matching output expectations. Zero MAID-specific dependencies — it speaks raw telnet.

Responsibilities:

  • TCP connection management with configurable timeouts
  • Telnet IAC negotiation (respond to WILL/DO/WONT/DONT)
  • Optional GMCP support (send/receive structured data)
  • ANSI escape code stripping for clean text matching
  • Output buffering with rolling history
  • Pattern-based expect() with timeout and failure diagnostics
  • Prompt detection (text not terminated by \r\n)

Key design decisions:

  • Async-native — Built on asyncio.open_connection, no threads
  • Non-blocking reads — Continuous background reader task fills an output buffer; expect() scans from a read cursor with an asyncio.Event for new data
  • Cursor-based matching — Each expect() and send() advances a read cursor so assertions only match text received since the last interaction
  • IAC handled transparently — Tests never see protocol bytes
  • ANSI stripping is opt-inMUDClient(strip_ansi=True) (default True for tests)
  • History preserved — Full session transcript available for debugging (ring buffer, configurable max size, default 10,000 lines)
  • Reader task contract — Created in connect(), cancelled in disconnect(). Reader exceptions are captured and re-raised in the next expect() call. disconnect() sets a sentinel that unblocks any waiting expect(). A reader_error property exposes the last exception for diagnostics.
  • EOF/crash handling — Socket close is detected immediately. If the server drops the connection, expect() raises ConnectionClosed (not a timeout). Fixtures can catch this to fail fast or skip remaining tests.

2. Expectation Engine

The expect() method is the primary assertion mechanism. It maintains a read cursor that advances through the output buffer, ensuring each expect() only matches text received after the previous expect() or send() call. This prevents stale output from prior commands from satisfying new assertions.

Design decision (post-review): All three adversarial reviewers identified "full buffer scan" as a critical race condition. The cursor-based approach eliminates an entire class of flaky tests where old output matches new patterns.

It supports:

Mode Example Description
Substring expect("Welcome") Waits for text containing "Welcome"
Regex expect(re.compile(r"Level \d+")) Waits for regex match
Exact line expect_line("Your choice: ") Matches a complete line exactly
Prompt expect_prompt("> ") Waits for a prompt (see Prompt Detection below)
Sequence expect_sequence(["Name:", "Race:", "Class:"]) Ordered multi-pattern
Absence expect_not("Error", until="> ") Asserts text does NOT appear before boundary
Any of expect_any(["Yes", "No"]) Returns whichever matches first
Full buffer expect("pattern", from_start=True) Ignores cursor; searches all output

Cursor behavior:

  • send() advances the cursor to the current buffer end
  • expect() searches from cursor forward; on match, advances cursor past the match
  • clear_buffer() resets both the buffer and the cursor
  • from_start=True bypasses the cursor for full-history assertions

Timeout behavior:

  • Default timeout: 5 seconds (configurable per-call and globally)
  • On timeout: raises ExpectTimeout with full buffer dump showing what WAS received
  • Timeout auto-scales: CI environments get 2x multiplier via TELTEST_TIMEOUT_MULTIPLIER

expect_sequence() timeout semantics:

  • timeout is the total time for all patterns combined (default)
  • per_step_timeout overrides with a per-pattern limit if provided
  • This is documented explicitly to avoid ambiguity

Prompt detection:

Prompts are text that arrives without a trailing \r\n. Because TCP packet fragmentation can cause false positives (a partial line looks like a prompt), expect_prompt() uses a multi-signal approach:

  1. Primary: Telnet GA (Go Ahead) signal, if the server sends it after SGA negotiation
  2. Secondary: Configurable prompt regex (e.g., r"^>|^\w+> ") — matches only known prompt patterns, not arbitrary partial lines
  3. Fallback: Quiescence detection — if no new data arrives for a configurable prompt_settle_time (default 0.3s) and the buffer ends without \r\n, treat as prompt

Design decision (post-review): Gemini identified that relying solely on missing \r\n is a race condition with packet fragmentation. The multi-signal approach provides defense in depth.

3. Test Fixtures & Server Lifecycle

Session-scoped server fixture — The engine starts once per pytest session (or per-module if configured), avoiding the ~2s startup cost per test.

Function-scoped client fixture — Each test gets a fresh TCP connection. The fixture handles connect, optional login, and guaranteed disconnect on teardown.

State isolation — Between tests, the world state is snapshot/restored to prevent state bleed. See State Isolation & Determinism.

Fixture hierarchy:

mud_server (session) ─── starts GameEngine + telnet on ephemeral port
  │                      waits for readiness probe before yielding
  ├── mud_client (function) ─── connects, logs in via AuthDriver, selects character
  ├── raw_client (function) ─── connects only (for testing login flow itself)
  └── registered_account (session) ─── creates a test account; validates on each use

Port allocation:

  • Uses socket.bind(('', 0)) to get a free ephemeral port
  • Port passed to GameEngine settings before startup
  • Eliminates port conflicts in parallel CI

Server readiness:

The mud_server fixture waits for the telnet listener to accept connections before yielding, using a retry loop with exponential backoff:

async def _wait_for_ready(host: str, port: int, timeout: float = 15.0) -> None:
    deadline = time.monotonic() + timeout
    while time.monotonic() < deadline:
        try:
            r, w = await asyncio.open_connection(host, port)
            w.close()
            await w.wait_closed()
            return
        except (ConnectionRefusedError, OSError):
            await asyncio.sleep(0.1)
    raise TimeoutError(f"Server not ready on {host}:{port} after {timeout}s")

Server liveness:

If the engine crashes mid-session, subsequent tests skip gracefully:

@pytest.fixture
async def mud_client(mud_server):
    if not mud_server.is_alive():
        pytest.skip("Server crashed in a previous test")
    ...

AuthDriver abstraction:

The login flow is abstracted behind a pluggable AuthDriver protocol, keeping the fixture layer portable. MAID provides MaidAuthDriver; other MUDs implement their own:

class AuthDriver(Protocol):
    """Pluggable login flow for different MUD engines."""

    async def register(
        self, client: MUDClient, username: str, password: str, email: str
    ) -> None:
        """Register a new account via the MUD's registration flow."""

    async def login(
        self, client: MUDClient, username: str, password: str
    ) -> None:
        """Log in to an existing account."""

    async def select_character(
        self, client: MUDClient, character_name: str
    ) -> None:
        """Select or create a character and enter the game world."""

Design decision (post-review): All three reviewers identified the MAID-coupled fixtures as undermining portability claims. The AuthDriver protocol makes the fixture layer genuinely reusable across MUD engines.

4. Script DSL

A YAML-based format for declarative test scenarios. Aimed at game designers and QA who may not write Python.

# tests/e2e/scripts/test_character_creation.yaml
name: "Character creation - Human Warrior"
description: "Verify a new player can create a Human Warrior character"
tags: [smoke, character-creation]

setup:
  account:
    username: "testplayer"
    password: "testpass123"
    register: true

steps:
  - send: "1"
    expect: "Select your race"

  - send: "1"
    expect: "Select your class"

  - send: "1"
    expect: "Select your gender"

  - send: "3"
    expect: "Character Summary"
    expect_all:
      - "Human"
      - "Warrior"
      - "Neutral"

  - send: "Y"
    expect: "Welcome to the world"

  - send: "look"
    expect_any:
      - "You see"
      - "exits:"

  - send: "quit"
    expect: "Goodbye"

Script runner — A pytest plugin collects .yaml files from a configured directory and generates parametrized test cases. Each script becomes a test item in pytest output:

tests/e2e/test_scripts.py::test_character_creation_human_warrior PASSED
tests/e2e/test_scripts.py::test_basic_movement PASSED

5. Session Recording & Playback

Record mode — Wraps a MUDClient to capture all send/receive pairs with timestamps. Saves to a .teltest JSON file:

{
  "recorded_at": "2026-02-18T06:00:00Z",
  "server": "localhost:4000",
  "events": [
    {"t": 0.0, "type": "recv", "text": "Welcome to MAID\r\n"},
    {"t": 0.1, "type": "recv", "text": "Your choice: "},
    {"t": 1.2, "type": "send", "text": "R"},
    {"t": 1.4, "type": "recv", "text": "Choose a username: "},
    ...
  ]
}

Playback mode — Converts a recording into a YAML script or a Python test, replacing literal text with fuzzy patterns where appropriate.

CLI tool:

# Record a session
uv run teltest record localhost 4000 -o session.teltest

# Convert recording to YAML test
uv run teltest convert session.teltest -o test_flow.yaml

# Run scripts directly
uv run teltest run tests/e2e/scripts/ --server localhost:4000

API Design

MUDClient API

class MUDClient:
    """Async telnet client for MUD E2E testing.

    This client is MUD-engine agnostic. It speaks raw telnet and
    provides pattern-matching expectations on text output.

    Example:
        async with MUDClient("localhost", 4000) as client:
            await client.expect("Welcome")
            await client.send("connect user pass")
            await client.expect("Entering World")
    """

    def __init__(
        self,
        host: str = "localhost",
        port: int = 4000,
        *,
        timeout: float = 5.0,
        strip_ansi: bool = True,
        encoding: str = "utf-8",
        negotiate_gmcp: bool = False,
    ) -> None: ...

    # --- Connection lifecycle ---

    async def connect(self) -> None:
        """Open TCP connection and complete telnet negotiation."""

    async def disconnect(self) -> None:
        """Gracefully close the connection."""

    async def __aenter__(self) -> "MUDClient": ...
    async def __aexit__(self, *exc: object) -> None: ...

    @property
    def is_connected(self) -> bool: ...

    # --- Sending ---

    async def send(self, text: str) -> None:
        """Send a line of text (appends \\r\\n)."""

    async def send_raw(self, data: bytes) -> None:
        """Send raw bytes without modification."""

    # --- Expecting ---

    async def expect(
        self,
        pattern: str | re.Pattern[str],
        *,
        timeout: float | None = None,
        from_start: bool = False,
    ) -> Match:
        """Wait for output matching pattern. Returns Match with context.

        Searches output received since the last send()/expect() call (cursor-based).
        Set from_start=True to search the entire buffer history.

        Args:
            pattern: Substring or compiled regex to match against output.
            timeout: Override default timeout for this call.
            from_start: If True, search from buffer start ignoring cursor.

        Returns:
            Match object containing matched text, full line, and buffer context.

        Raises:
            ExpectTimeout: If pattern not found within timeout.
            ConnectionClosed: If the server closed the connection.
        """

    async def expect_prompt(
        self,
        prompt: str | re.Pattern[str] = "> ",
        *,
        timeout: float | None = None,
    ) -> str:
        """Wait for a prompt using multi-signal detection.

        Detection priority:
        1. Telnet GA (Go Ahead) signal
        2. Match against prompt pattern
        3. Quiescence (no new data for prompt_settle_time)
        """

    async def expect_line(
        self,
        text: str,
        *,
        timeout: float | None = None,
    ) -> str:
        """Wait for an exact complete line."""

    async def expect_sequence(
        self,
        patterns: list[str | re.Pattern[str]],
        *,
        timeout: float | None = None,
        per_step_timeout: float | None = None,
    ) -> list[Match]:
        """Wait for multiple patterns in order.

        Args:
            timeout: Total time allowed for all patterns (default).
            per_step_timeout: If set, each pattern gets this much time instead.
        """

    async def expect_any(
        self,
        patterns: list[str | re.Pattern[str]],
        *,
        timeout: float | None = None,
    ) -> tuple[int, Match]:
        """Wait for any of the patterns. Returns (index, match)."""

    async def expect_not(
        self,
        pattern: str | re.Pattern[str],
        *,
        until: str | re.Pattern[str],
        timeout: float | None = None,
    ) -> None:
        """Assert that pattern does NOT appear before the 'until' boundary.

        This avoids unconditional sleeps by waiting for a positive signal
        (e.g., the next prompt) and asserting the negative pattern didn't
        appear before it.

        Args:
            pattern: The text/regex that must NOT appear.
            until: A positive boundary pattern to wait for.
            timeout: Max time to wait for the 'until' boundary.

        Raises:
            UnexpectedMatch: If pattern appears before until.
            ExpectTimeout: If until is not found within timeout.
        """

    # --- Buffer access ---

    @property
    def output(self) -> str:
        """All received text since connection (ring buffer, max 10k lines)."""

    @property
    def recent(self) -> str:
        """Text received since last send() or expect() call (cursor-based)."""

    def clear_buffer(self) -> None:
        """Clear the output buffer and reset the read cursor."""

    @property
    def transcript(self) -> list[TranscriptEntry]:
        """Full session transcript with timestamps (ring buffer)."""

    # --- Connection health ---

    @property
    def reader_error(self) -> Exception | None:
        """Last exception from the background reader task, if any."""

    # --- GMCP (optional) ---

    async def send_gmcp(self, package: str, data: dict[str, Any]) -> None:
        """Send a GMCP message."""

    async def expect_gmcp(
        self,
        package: str,
        *,
        timeout: float | None = None,
    ) -> dict[str, Any]:
        """Wait for a GMCP message on the given package."""

    @property
    def gmcp_messages(self) -> list[tuple[str, dict[str, Any]]]:
        """All received GMCP messages."""


@dataclass(frozen=True)
class Match:
    """Result of a successful expect() call."""
    text: str           # The matched text
    line: str           # The full line containing the match
    pattern: str        # The pattern that matched
    elapsed: float      # Seconds waited
    buffer_before: str  # Buffer context before the match (last N lines)


@dataclass(frozen=True)
class TranscriptEntry:
    """Single entry in the session transcript."""
    timestamp: float
    direction: Literal["send", "recv", "gmcp_send", "gmcp_recv"]
    text: str


class ExpectTimeout(AssertionError):
    """Raised when expect() times out.

    Attributes:
        pattern: What we were looking for.
        timeout: How long we waited.
        buffer: What was actually received.
    """
    pattern: str
    timeout: float
    buffer: str


class ConnectionClosed(Exception):
    """Raised when the server closes the connection during expect().

    Attributes:
        buffer: Text received before the connection closed.
    """
    buffer: str


class UnexpectedMatch(AssertionError):
    """Raised by expect_not() when the forbidden pattern appears.

    Attributes:
        pattern: The pattern that should not have appeared.
        match: The Match object showing where it appeared.
    """
    pattern: str
    match: Match

Fixture API

@pytest.fixture(scope="session")
async def mud_server() -> AsyncGenerator[MUDServer, None]:
    """Start a MAID server for E2E testing.

    Starts GameEngine with in-memory storage on an ephemeral port.
    Waits for readiness probe before yielding.
    The server runs for the entire pytest session.

    Yields:
        MUDServer with .host, .port, .engine, .is_alive() attributes.
    """

@pytest.fixture
async def raw_client(mud_server: MUDServer) -> AsyncGenerator[MUDClient, None]:
    """A connected MUDClient with no login.

    Use this for testing the login/registration flow itself.
    Skips if server is not alive (crashed in a previous test).
    """

@pytest.fixture
async def mud_client(
    mud_server: MUDServer,
    registered_account: AccountInfo,
) -> AsyncGenerator[MUDClient, None]:
    """A MUDClient that is logged in and in the game world.

    Uses the configured AuthDriver to handle the login flow.
    Ready to send game commands immediately.
    Skips if server is not alive.
    """

@pytest.fixture(scope="session")
async def registered_account(mud_server: MUDServer) -> AccountInfo:
    """An account registered on the test server.

    Created once per session via the AuthDriver registration flow.
    Validates account still exists on each use.
    """

@dataclass
class MUDServer:
    """Handle to the running test server."""
    host: str
    port: int
    engine: GameEngine

    def is_alive(self) -> bool:
        """Check if the engine is still running."""

    async def wait_ticks(self, n: int = 1) -> None:
        """Block until N game ticks have been processed."""

    async def snapshot_world(self) -> WorldSnapshot:
        """Capture current world state for later restoration."""

    async def restore_world(self, snapshot: WorldSnapshot) -> None:
        """Restore world state from a snapshot."""

@dataclass
class AccountInfo:
    """Credentials for a test account."""
    username: str
    password: str
    email: str

Script DSL API

# Full schema for a TelTest script
name: string          # Required. Test name (becomes pytest node ID)
description: string   # Optional. Shown in verbose output
tags: [string]        # Optional. Maps to pytest markers
timeout: float        # Optional. Default timeout for all steps (default: 5.0)

setup:                # Optional. Pre-test configuration
  account:            # Account to use
    username: string
    password: string
    register: bool    # If true, register the account first
  character:          # Character to create/select
    name: string
    race: string      # e.g. "human", "elf"
    class: string     # e.g. "warrior", "mage"
    select: int       # Or select existing by index

steps:                # Required. Test actions
  - send: string                # Send a command
    expect: string              # Wait for substring (shorthand)
    expect_re: string           # Wait for regex
    expect_all: [string]        # Wait for ALL substrings (any order)
    expect_any: [string]        # Wait for ANY substring
    expect_not:                 # Assert text does NOT appear before boundary
      pattern: string
      until: string             # Required positive boundary
    expect_prompt: string       # Wait for specific prompt
    timeout: float              # Per-step timeout override
    delay: float                # Wait N seconds before this step

  - branch: string              # Send text and branch on response
    cases:
      "pattern A":              # If output contains this...
        - send: "..."           #   ...run these steps
      "pattern B":
        - send: "..."

  - note: string                # Comment in the transcript (no action)

  - group: string               # Named step group for reporting
    steps: [...]                 # Nested steps

teardown:              # Optional. Cleanup actions
  - send: "quit"

Telnet Protocol Handling

The MUDClient must handle telnet at the byte level. The design isolates protocol handling into a TelnetProtocol layer.

  TCP bytes in
┌──────────────┐
│ TelnetProtocol│──── IAC negotiation (auto-respond)
│              │──── MCCP2 decompression
│              │──── GMCP extraction
└──────┬───────┘
       │ clean text
┌──────────────┐
│  ANSIStripper │──── Remove \x1b[...m sequences
└──────┬───────┘
       │ plain text
┌──────────────┐
│ OutputBuffer  │──── Line splitting, prompt detection
└──────────────┘

IAC negotiation strategy:

Server sends Client responds Notes
WILL GMCP (201) DO GMCP (if gmcp enabled) or DONT GMCP GMCP opt-in
WILL ECHO (1) DO ECHO Password hiding
WILL SGA (3) DO SGA Standard
DO TTYPE (24) WILL TTYPE, then send "TELTEST" Identifies as TelTest client
DO NAWS (31) WILL NAWS, then send 80×24 Standard terminal size
DO MSDP (69) WONT MSDP Not needed for testing
DO MXP (91) WONT MXP Not needed for testing
WILL MCCP2 (86) DONT MCCP2 Compression adds complexity; decline by default
Anything else WONT or DONT Safe default: refuse unknown options

Prompt detection:

Prompts are text that arrives without a trailing \r\n. The buffer tracks whether the last chunk ended mid-line. When expect_prompt() is called, it matches against the current unterminated line in the buffer.


State Isolation & Determinism

This section addresses the #1 concern from adversarial review. All three reviewers (Opus, Codex, Gemini) identified shared mutable state as the most critical risk to test reliability.

The Problem

The mud_server fixture is session-scoped for performance (~2s startup cost). But this means all tests share one GameEngine and one World. Without isolation:

  • Test A creates character "Thorn" → Test B sees "Thorn" in the room
  • Test C drops a sword → Test D picks it up
  • Test ordering becomes load-bearing → flaky CI

Solution: World Snapshot/Restore

The mud_server fixture exposes snapshot_world() / restore_world() methods. A function-scoped autouse fixture captures state before each test and restores it after:

@pytest.fixture(autouse=True)
async def _isolate_world(mud_server: MUDServer) -> AsyncGenerator[None, None]:
    """Snapshot and restore world state around each test."""
    snapshot = await mud_server.snapshot_world()
    yield
    await mud_server.restore_world(snapshot)

What gets snapshot/restored:

  • Entity registry (entities, components, tags)
  • Room index and spatial data
  • In-memory document store collections
  • Command registry state

What is NOT restored (intentionally):

  • Account registry (session-scoped accounts persist)
  • Engine configuration
  • Network listener state

Tick Synchronization

The game engine runs a tick loop that processes systems asynchronously. Tests that depend on system processing (combat, NPC AI, item respawn) need to synchronize with the tick loop.

# Wait for the combat system to process the attack
await mud_client.send("attack goblin")
await mud_server.wait_ticks(2)  # Wait for 2 ticks to process
await mud_client.expect("You hit")

API:

class MUDServer:
    async def wait_ticks(self, n: int = 1) -> None:
        """Block until N ticks have been processed.

        Uses an asyncio.Event that the engine tick loop signals after each tick.
        """

    async def pause_ticks(self) -> None:
        """Pause the tick loop for deterministic step-through testing."""

    async def step_tick(self) -> None:
        """Process exactly one tick while paused."""

    async def resume_ticks(self) -> None:
        """Resume the tick loop after pausing."""

Design decision: Tick sync is exposed on MUDServer (not MUDClient) because it requires engine access. The client is intentionally ignorant of the server's internals to maintain the portability boundary.

Unique Test Data

To further reduce state coupling, fixtures generate unique data per test:

@pytest.fixture
def unique_name() -> str:
    """Generate a unique character/account name per test."""
    return f"test_{uuid4().hex[:8]}"

This prevents name collisions even when snapshot/restore is not perfect.


Test Patterns & Examples

Pattern 1: Full Login Flow

@pytest.mark.e2e
async def test_register_login_create_character(raw_client: MUDClient) -> None:
    """Test complete new-player experience."""
    # Registration
    await raw_client.expect("Please select an option")
    await raw_client.send("R")
    await raw_client.expect("Choose a username")
    await raw_client.send("testplayer")
    await raw_client.expect("Email address")
    await raw_client.send("test@example.com")
    await raw_client.expect("Choose a password")
    await raw_client.send("secret123")
    await raw_client.expect("Confirm password")
    await raw_client.send("secret123")
    await raw_client.expect("Account created")

    # Character creation
    await raw_client.expect("Character Selection")
    await raw_client.send("C")
    await raw_client.expect("Enter character name")
    await raw_client.send("Thorn")
    await raw_client.expect("Select your race")
    await raw_client.send("1")  # Human
    await raw_client.expect("Select your class")
    await raw_client.send("1")  # Warrior
    await raw_client.expect("Select your gender")
    await raw_client.send("3")  # Neutral
    await raw_client.expect("Character Summary")
    await raw_client.send("Y")
    await raw_client.expect("Welcome to the world, Thorn!")

    # In world
    await raw_client.expect("Entering World")
    await raw_client.send("quit")
    await raw_client.expect("Goodbye")

Pattern 2: Gameplay Commands

@pytest.mark.e2e
async def test_look_and_move(mud_client: MUDClient) -> None:
    """Test basic navigation. mud_client is already logged in."""
    await mud_client.send("look")
    match = await mud_client.expect(re.compile(r"exits?:", re.IGNORECASE))
    assert match.text  # Room has exits

    await mud_client.send("inventory")
    await mud_client.expect_any(["You are carrying", "Your inventory is empty"])

Pattern 3: Multi-Client Interaction

@pytest.mark.e2e
async def test_player_sees_other_player(mud_server: MUDServer) -> None:
    """Two players in the same room can see each other."""
    async with logged_in_client(mud_server, "alice") as alice, \
               logged_in_client(mud_server, "bob") as bob:
        # Both start in the same room
        await alice.send("look")
        await alice.expect("Bob")

        await bob.send("say Hello!")
        await alice.expect("Bob says")

Pattern 4: YAML Script

name: "Basic navigation smoke test"
tags: [smoke, navigation]

setup:
  account:
    username: "navtest"
    password: "testpass123"
    register: true
  character:
    name: "Navigator"
    race: "human"
    class: "warrior"

steps:
  - send: "look"
    expect: "exits"

  - send: "inventory"
    expect_any: ["carrying", "empty"]

  - send: "quit"
    expect: "Goodbye"

Error Handling & Diagnostics

ExpectTimeout Failure Output

When expect() times out, the error message provides full context:

teltest.ExpectTimeout: Pattern not found within 5.0s

  Expected: "Welcome to the world"

  Received (last 20 lines):
  ─────────────────────────
  |  Select your gender:
  |
  |    [1] Male
  |    [2] Female
  |    [3] Neutral
  |
  |    [C]ancel - Return to selection
  |
  |  Select gender (1-3): █  ← (prompt, waiting for input)
  ─────────────────────────

  Transcript (last 5 interactions):
    [0.00s] RECV: "Character Summary"
    [0.01s] RECV: "Create this character?"
    [0.50s] SEND: "Y"
    [0.52s] RECV: "Character created successfully."
    [5.00s] TIMEOUT waiting for "Welcome to the world"

  Hint: The server may be waiting for additional input. Check if
        a prompt was expected before this step.

Connection Failure

teltest.ConnectionFailed: Could not connect to localhost:4000

  Attempted: 3 retries over 6.0s
  Last error: ConnectionRefusedError: [Errno 111] Connection refused

  Hint: Is the server running? Check the mud_server fixture log.

Diagnostics Features

  • --teltest-verbose — Print full transcript to stdout on failure
  • --teltest-record — Save .teltest recording for every test
  • --teltest-timeout-multiplier=N — Scale all timeouts (useful for slow CI)
  • Transcript on every assertion failure — via pytest plugin hook

Configuration

Environment Variables

Variable Default Description
TELTEST_HOST localhost Server host for standalone runs
TELTEST_PORT 4000 Server port for standalone runs
TELTEST_TIMEOUT 5.0 Default expect timeout (seconds)
TELTEST_TIMEOUT_MULTIPLIER 1.0 Multiply all timeouts (for CI)
TELTEST_STRIP_ANSI true Strip ANSI escape codes
TELTEST_SCRIPT_DIR tests/e2e/scripts YAML script search directory
TELTEST_RECORDING_DIR tests/e2e/recordings Recording output directory

pytest.ini / pyproject.toml

[tool.pytest.ini_options]
markers = [
    "e2e: End-to-end tests requiring a running server",
]

[tool.teltest]
timeout = 5.0
timeout_multiplier = 1.0
strip_ansi = true
script_dirs = ["tests/e2e/scripts"]
server_startup_timeout = 15.0

CI/CD Integration

Added post-review. Opus and Gemini flagged the missing CI section as a gap given success criterion SC-4.

GitHub Actions Workflow

name: E2E Tests
on: [push, pull_request]

jobs:
  e2e:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    env:
      MAID_DEBUG: "true"
      TELTEST_TIMEOUT_MULTIPLIER: "2.0"

    steps:
      - uses: actions/checkout@v4

      - name: Install uv
        uses: astral-sh/setup-uv@v4

      - name: Install dependencies
        run: uv sync

      - name: Run E2E tests
        run: uv run pytest tests/e2e/ -m e2e -x --timeout=120

      - name: Upload transcripts on failure
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: teltest-transcripts
          path: tests/e2e/recordings/

Key CI Considerations

  • Timeout multiplier: Set TELTEST_TIMEOUT_MULTIPLIER=2.0 in CI — shared runners have variable CPU availability
  • -x (fail fast): Stop on first failure since server crash makes all subsequent tests pointless
  • Separate from unit tests: Run pytest -m e2e separately from pytest -m "not e2e" so unit test failures don't block E2E and vice versa
  • Port isolation: Ephemeral ports via socket.bind(('', 0)) — no hardcoded ports
  • Zombie prevention: The mud_server fixture registers an atexit handler to terminate the engine. The GitHub Actions timeout-minutes provides a hard cap.
  • Docker networking: localhost works in CI containers. No special networking needed since server and tests run in the same process.
  • pytest-xdist support: For parallel execution, each xdist worker gets its own mud_server fixture (session-scoped per worker). Port allocation is already per-instance via ephemeral binding.

Running E2E Locally

# Run all E2E tests
uv run pytest tests/e2e/ -m e2e

# Run just YAML script tests
uv run pytest tests/e2e/test_scripts.py

# Run with verbose transcript output
uv run pytest tests/e2e/ -m e2e --teltest-verbose

# Validate YAML scripts without running
uv run teltest validate tests/e2e/scripts/

Package Structure

packages/teltest/
├── pyproject.toml              # Standalone package, no MAID deps
├── src/teltest/
│   ├── __init__.py             # Public API exports
│   ├── client.py               # MUDClient class
│   ├── protocol.py             # TelnetProtocol (IAC handling)
│   ├── ansi.py                 # ANSI escape code stripper
│   ├── buffer.py               # OutputBuffer with line/prompt tracking
│   ├── expect.py               # Expectation engine (Match, ExpectTimeout)
│   ├── gmcp.py                 # GMCP send/receive helpers
│   ├── recorder.py             # Session recording
│   ├── script/
│   │   ├── __init__.py
│   │   ├── schema.py           # YAML script schema (Pydantic models)
│   │   ├── runner.py           # Script execution engine
│   │   └── converter.py        # Recording → script converter
│   ├── cli.py                  # CLI entry point (record, convert, run)
│   └── pytest_plugin.py        # Fixtures, markers, CLI options
├── tests/                      # TelTest's own tests
│   ├── test_client.py
│   ├── test_protocol.py
│   ├── test_ansi.py
│   ├── test_buffer.py
│   ├── test_expect.py
│   ├── test_script_runner.py
│   └── conftest.py
└── README.md                   # Standalone package docs

# MAID-specific E2E tests live in the main repo:
tests/e2e/
├── conftest.py                 # mud_server, mud_client, registered_account fixtures
├── test_login.py               # Login/registration flow tests
├── test_character.py           # Character creation/selection tests
├── test_gameplay.py            # Core gameplay command tests
├── test_multiplayer.py         # Multi-client interaction tests
└── scripts/                    # YAML test scripts
    ├── smoke_login.yaml
    ├── smoke_navigation.yaml
    └── character_creation_matrix.yaml

Key decision: TelTest is a standalone package with zero MAID dependencies. It can be published independently and used with any MUD that speaks telnet. The MAID-specific fixtures and tests live in the main repo's tests/e2e/ directory.


Migration & Adoption

Phase 1: Core Client & Fixtures

  • Implement MUDClient, TelnetProtocol, OutputBuffer, ANSIStripper
  • Implement mud_server, raw_client, mud_client fixtures
  • Write first E2E test: full login→create→look→quit flow
  • Add pytest -m e2e marker to CI pipeline

Phase 2: Script DSL & Recording

  • Implement YAML script schema and runner
  • Implement session recorder and converter
  • Record reference sessions for smoke tests
  • Onboard QA team with script authoring docs

Phase 3: Advanced Features

  • GMCP assertion support
  • Multi-client test helpers
  • Parametrized character creation matrix (all race/class combos)
  • CI integration with timeout multiplier and parallel execution

Future Work

  • WebSocket client — A companion WSMUDClient for testing the /ws/game path with JSON message assertions
  • Fuzz testing — Random input generation to find crash-inducing sequences
  • Visual diff — Side-by-side comparison of expected vs. actual session transcripts
  • MUD compatibility matrix — Test TelTest against other MUD engines (Evennia, Ranvier, etc.) to validate portability
  • pytest-xdist support — Parallel E2E test execution with isolated server instances
  • AI-assisted test generation — Use LLM to convert natural-language scenarios ("test that a warrior can equip a sword") into test scripts

Appendix A: Telnet Protocol Reference

Byte Name Description
255 IAC Interpret As Command
254 DONT Refuse option
253 DO Request option
252 WONT Will not use option
251 WILL Will use option
250 SB Subnegotiation Begin
240 SE Subnegotiation End
1 ECHO Echo
3 SGA Suppress Go-Ahead
24 TTYPE Terminal Type
31 NAWS Window Size
69 MSDP MUD Server Data Protocol
70 MSSP MUD Server Status Protocol
86 MCCP2 Compression v2
91 MXP MUD eXtension Protocol
201 GMCP Generic MUD Communication Protocol

Appendix B: MAID Login Flow Reference

Connect to server
"Please select an option"
  [L]ogin  [R]egister  [Q]uit
       ├── L ──► "Username: " → "Password: " → Login validation
       │              │                              │
       │              │                    ┌─────────┴─────────┐
       │              │                  Success             Fail (3 max)
       │              │                    │                    │
       │              │                    ▼                    ▼
       │              │            Character Selection     Disconnect
       │              │
       ├── R ──► "Choose a username: " → "Email: " → "Password: " → "Confirm: "
       │                                                                │
       │                                                       Account Created
       │                                                                │
       │                                                       Character Selection
       └── Q ──► Disconnect

Character Selection
       ├── [N] Select existing ──► "Welcome back, {name}!" ──► Enter World
       └── [C] Create new ──► Name → Race (1-10) → Class (1-10) → Gender (1-3)
                                    → Confirm (Y/N) → "Welcome to the world!" → Enter World

Enter World
"[Entering World...]"
{look output}
"> " (prompt)
   Main Loop: receive command → execute → send output → prompt
       └── "quit" / "exit" ──► "Goodbye!" ──► Disconnect

Appendix C: Adversarial Review Summary

This design was reviewed by three adversarial agents (Claude Opus, GPT Codex, Gemini Pro) tasked with finding flaws, gaps, and unrealistic assumptions. Their feedback was synthesized and incorporated into the design above. This appendix records the key findings for traceability.

Unanimous Critical Findings (all 3 reviewers)

# Finding Resolution
1 Session-scoped server with shared mutable world state causes test bleed Added State Isolation section with snapshot/restore
2 expect() scanning full buffer matches stale output from prior commands Redesigned to cursor-based matching; from_start opt-in for full scan
3 Portability claim undermined by MAID-coupled fixtures Added AuthDriver protocol abstraction in fixtures

Unanimous Major Findings (2+ reviewers)

# Finding Resolution
4 expect_not() is an unconditional sleep with no correctness guarantee Redesigned with required until positive boundary parameter
5 No tick synchronization primitive for deterministic testing Added wait_ticks(), pause_ticks(), step_tick() to MUDServer
6 Prompt detection via missing \r\n is a TCP fragmentation race Added multi-signal detection (GA, regex, quiescence)
7 No server readiness check before tests begin Added readiness probe retry loop in mud_server fixture
8 Reader task lifecycle gaps (crash, EOF, cancellation) Specified reader contract: capture + re-raise, sentinel on disconnect
9 Unbounded transcript buffer Changed to ring buffer (10k lines default)
10 Missing CI/CD section Added CI/CD Integration section
11 expect_sequence() timeout semantics ambiguous Added explicit timeout (total) vs per_step_timeout parameters
12 No server liveness check after crash Added is_alive() with pytest.skip in fixtures

Notable Minor Findings

  • YAML DSL needs branch support for non-happy-path scenarios (added)
  • YAML script validation should happen before execution (added teltest validate CLI)
  • logged_in_client() helper used in examples but not defined in API (noted for implementation)
  • Session recording should optionally capture IAC negotiation bytes (noted for Phase 3)
  • Tags-to-markers mapping needs explicit rules (noted for implementation)

Reviewer Kudos (consensus strengths)

  • Standalone package separation (packages/teltest/ vs tests/e2e/)
  • ExpectTimeout error output design (buffer dump + transcript + hints)
  • Ephemeral port allocation for CI
  • ANSI stripping as default with opt-out
  • MCCP2 declined by default (pragmatic)
  • Phased migration plan (ship Phase 1 first)