TelTest — E2E Testing Framework for MUDs¶

Version: 1.1 (post-adversarial review) Status: Implemented Author(s): MAID Core Team Date: 2026-02-18 Priority: P1 — Critical for regression safety as game content grows

Table of Contents¶

Executive Summary
Problem Statement & Current State
Design Goals & Non-Goals
Architecture Overview
Core Components
1. MUDClient — Async Telnet Driver
2. Expectation Engine
3. Test Fixtures & Server Lifecycle
4. Script DSL
5. Session Recording & Playback
API Design
MUDClient API
Fixture API
Script DSL API
Telnet Protocol Handling
State Isolation & Determinism
Test Patterns & Examples
Error Handling & Diagnostics
Configuration
CI/CD Integration
Package Structure
Migration & Adoption
Future Work
Appendix A: Telnet Protocol Reference
Appendix B: MAID Login Flow Reference
Appendix C: Adversarial Review Summary

Executive Summary¶

TelTest is a pytest-native E2E testing framework for MUD engines. It provides an async telnet client that connects to a running server and drives it through text-based send/expect interactions — the same way a real player would. Think of it as "Playwright for MUDs."

Key capabilities:

MUDClient — An async Python telnet client with pattern-matching expect() that handles IAC negotiation, ANSI stripping, and prompt detection
Pytest fixtures — Managed server lifecycle (start engine → run tests → teardown) with automatic port allocation to avoid conflicts
Script DSL — Declarative YAML-based test scripts for non-programmers to author E2E scenarios
Session recording — Capture real play sessions and convert them into reproducible test cases

Business impact: Enables regression testing of the full player experience — login, character creation, combat, quests — across every commit. Catches integration bugs that unit tests miss (protocol negotiation, command routing, state transitions across ticks).

Problem Statement & Current State¶

What Exists Today¶

Layer	Test Coverage	Notes
ECS core	✅ Extensive	Unit tests for World, Entity, Component, System
Command handlers	✅ Good	MockSession-based tests for individual commands
Content packs	✅ Good	Unit tests for pack protocol compliance
Network protocol	⚠️ Minimal	Mocked `asyncio.start_server`, no real connections
Login/auth flow	❌ None	CharacterHandler/LoginHandler untested end-to-end
Full player journey	❌ None	No test connects, logs in, plays, and disconnects
Cross-system interactions	❌ None	Combat→inventory→quest chains untested

The Problem¶

No full-stack tests. A bug where character.race is a string instead of an enum passes all unit tests but crashes every new player on login (we just fixed this exact bug).
No protocol-level tests. Telnet negotiation, GMCP dispatch, MCCP compression, and prompt detection are tested only via mocks.
Manual QA is the only safety net. Every release requires someone to telnet in and manually walk through character creation.
Content pack interactions are invisible. When stdlib changes break classic-rpg, nothing catches it until runtime.

Success Criteria¶

ID	Criterion	Metric
SC-1	Full login→play→quit flow tested	≥1 test per connection path (telnet, websocket)
SC-2	Character creation tested	All races/classes create successfully
SC-3	Core gameplay loop tested	look, move, get, drop, inventory, say
SC-4	Tests run in CI	< 60s total, no port conflicts, no flakiness
SC-5	Non-programmers can author tests	YAML script DSL with clear documentation
SC-6	Portable to other MUDs	Core MUDClient has zero MAID-specific dependencies

Design Goals & Non-Goals¶

Goals¶

Pytest-native — Tests are normal async pytest functions using fixtures
Real network I/O — Tests connect over TCP, exercising the full stack
Fast — Server starts once per test session, tests reuse the connection where possible
Deterministic — Automatic port allocation, seeded randomness, tick synchronization
Portable — MUDClient is a standalone async telnet client usable with any MUD
Debuggable — Rich failure messages showing expected vs. received output with context

Non-Goals¶

GUI testing — Web frontend testing is out of scope (Playwright covers that)
Load/stress testing — Performance benchmarks are a separate concern
AI response testing — LLM outputs are non-deterministic; test the plumbing, not the prose
Cross-network testing — Tests run against localhost only

Architecture Overview¶

┌─────────────────────────────────────────────────────────────┐
│                        pytest session                        │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────────────┐  │
│  │  test_login.py│  │test_combat.py│  │ test_scripts.py   │  │
│  │              │  │              │  │  (YAML runner)    │  │
│  └──────┬───────┘  └──────┬───────┘  └────────┬──────────┘  │
│         │                 │                    │             │
│         ▼                 ▼                    ▼             │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              TelTest Fixtures Layer                  │    │
│  │  • mud_server (session-scoped, starts GameEngine)   │    │
│  │  • mud_client (function-scoped, connects & logs in) │    │
│  │  • raw_client (function-scoped, bare connection)    │    │
│  └──────────────────────┬──────────────────────────────┘    │
│                         │                                    │
│                         ▼                                    │
│  ┌─────────────────────────────────────────────────────┐    │
│  │                   MUDClient                          │    │
│  │  • connect() / disconnect()                         │    │
│  │  • send(text) / expect(pattern)                     │    │
│  │  • expect_prompt() / expect_sequence()              │    │
│  │  • ANSI stripping / IAC negotiation                 │    │
│  │  • Output buffer with history                       │    │
│  └──────────────────────┬──────────────────────────────┘    │
│                         │ TCP                                │
└─────────────────────────┼───────────────────────────────────┘
                          │
                          ▼
              ┌───────────────────────┐
              │     MAID Server       │
              │  (GameEngine + Telnet │
              │   on ephemeral port)  │
              └───────────────────────┘

Data Flow¶

pytest discovers test functions and invokes session-scoped mud_server fixture
mud_server starts a GameEngine with in-memory storage on a random free port
Each test gets a mud_client — a MUDClient instance connected over TCP
Test calls send() / expect() to drive the interaction
MUDClient handles telnet IAC negotiation transparently
On fixture teardown, client disconnects and server stops

Core Components¶

1. MUDClient — Async Telnet Driver¶

The heart of the framework. A pure-async telnet client with pattern-matching output expectations. Zero MAID-specific dependencies — it speaks raw telnet.

Responsibilities:

TCP connection management with configurable timeouts
Telnet IAC negotiation (respond to WILL/DO/WONT/DONT)
Optional GMCP support (send/receive structured data)
ANSI escape code stripping for clean text matching
Output buffering with rolling history
Pattern-based expect() with timeout and failure diagnostics
Prompt detection (text not terminated by \r\n)

Key design decisions:

Async-native — Built on asyncio.open_connection, no threads
Non-blocking reads — Continuous background reader task fills an output buffer; expect() scans from a read cursor with an asyncio.Event for new data
Cursor-based matching — Each expect() and send() advances a read cursor so assertions only match text received since the last interaction
IAC handled transparently — Tests never see protocol bytes
ANSI stripping is opt-in — MUDClient(strip_ansi=True) (default True for tests)
History preserved — Full session transcript available for debugging (ring buffer, configurable max size, default 10,000 lines)
Reader task contract — Created in connect(), cancelled in disconnect(). Reader exceptions are captured and re-raised in the next expect() call. disconnect() sets a sentinel that unblocks any waiting expect(). A reader_error property exposes the last exception for diagnostics.
EOF/crash handling — Socket close is detected immediately. If the server drops the connection, expect() raises ConnectionClosed (not a timeout). Fixtures can catch this to fail fast or skip remaining tests.

2. Expectation Engine¶

The expect() method is the primary assertion mechanism. It maintains a read cursor that advances through the output buffer, ensuring each expect() only matches text received after the previous expect() or send() call. This prevents stale output from prior commands from satisfying new assertions.

Design decision (post-review): All three adversarial reviewers identified "full buffer scan" as a critical race condition. The cursor-based approach eliminates an entire class of flaky tests where old output matches new patterns.

It supports:

Mode	Example	Description
Substring	`expect("Welcome")`	Waits for text containing "Welcome"
Regex	`expect(re.compile(r"Level \d+"))`	Waits for regex match
Exact line	`expect_line("Your choice: ")`	Matches a complete line exactly
Prompt	`expect_prompt("> ")`	Waits for a prompt (see Prompt Detection below)
Sequence	`expect_sequence(["Name:", "Race:", "Class:"])`	Ordered multi-pattern
Absence	`expect_not("Error", until="> ")`	Asserts text does NOT appear before boundary
Any of	`expect_any(["Yes", "No"])`	Returns whichever matches first
Full buffer	`expect("pattern", from_start=True)`	Ignores cursor; searches all output

Cursor behavior:

send() advances the cursor to the current buffer end
expect() searches from cursor forward; on match, advances cursor past the match
clear_buffer() resets both the buffer and the cursor
from_start=True bypasses the cursor for full-history assertions

Timeout behavior:

Default timeout: 5 seconds (configurable per-call and globally)
On timeout: raises ExpectTimeout with full buffer dump showing what WAS received
Timeout auto-scales: CI environments get 2x multiplier via TELTEST_TIMEOUT_MULTIPLIER

expect_sequence() timeout semantics:

timeout is the total time for all patterns combined (default)
per_step_timeout overrides with a per-pattern limit if provided
This is documented explicitly to avoid ambiguity

Prompt detection:

Prompts are text that arrives without a trailing \r\n. Because TCP packet fragmentation can cause false positives (a partial line looks like a prompt), expect_prompt() uses a multi-signal approach:

Primary: Telnet GA (Go Ahead) signal, if the server sends it after SGA negotiation
Secondary: Configurable prompt regex (e.g., r"^>|^\w+> ") — matches only known prompt patterns, not arbitrary partial lines
Fallback: Quiescence detection — if no new data arrives for a configurable prompt_settle_time (default 0.3s) and the buffer ends without \r\n, treat as prompt

Design decision (post-review): Gemini identified that relying solely on missing \r\n is a race condition with packet fragmentation. The multi-signal approach provides defense in depth.

3. Test Fixtures & Server Lifecycle¶

Session-scoped server fixture — The engine starts once per pytest session (or per-module if configured), avoiding the ~2s startup cost per test.

Function-scoped client fixture — Each test gets a fresh TCP connection. The fixture handles connect, optional login, and guaranteed disconnect on teardown.

State isolation — Between tests, the world state is snapshot/restored to prevent state bleed. See State Isolation & Determinism.

Fixture hierarchy:

mud_server (session) ─── starts GameEngine + telnet on ephemeral port
  │                      waits for readiness probe before yielding
  ├── mud_client (function) ─── connects, logs in via AuthDriver, selects character
  ├── raw_client (function) ─── connects only (for testing login flow itself)
  └── registered_account (session) ─── creates a test account; validates on each use

Port allocation:

Uses socket.bind(('', 0)) to get a free ephemeral port
Port passed to GameEngine settings before startup
Eliminates port conflicts in parallel CI

Server readiness:

The mud_server fixture waits for the telnet listener to accept connections before yielding, using a retry loop with exponential backoff:

async def _wait_for_ready(host: str, port: int, timeout: float = 15.0) -> None:
    deadline = time.monotonic() + timeout
    while time.monotonic() < deadline:
        try:
            r, w = await asyncio.open_connection(host, port)
            w.close()
            await w.wait_closed()
            return
        except (ConnectionRefusedError, OSError):
            await asyncio.sleep(0.1)
    raise TimeoutError(f"Server not ready on {host}:{port} after {timeout}s")

Server liveness:

If the engine crashes mid-session, subsequent tests skip gracefully:

@pytest.fixture
async def mud_client(mud_server):
    if not mud_server.is_alive():
        pytest.skip("Server crashed in a previous test")
    ...

AuthDriver abstraction:

The login flow is abstracted behind a pluggable AuthDriver protocol, keeping the fixture layer portable. MAID provides MaidAuthDriver; other MUDs implement their own:

class AuthDriver(Protocol):
    """Pluggable login flow for different MUD engines."""

    async def register(
        self, client: MUDClient, username: str, password: str, email: str
    ) -> None:
        """Register a new account via the MUD's registration flow."""

    async def login(
        self, client: MUDClient, username: str, password: str
    ) -> None:
        """Log in to an existing account."""

    async def select_character(
        self, client: MUDClient, character_name: str
    ) -> None:
        """Select or create a character and enter the game world."""

Design decision (post-review): All three reviewers identified the MAID-coupled fixtures as undermining portability claims. The AuthDriver protocol makes the fixture layer genuinely reusable across MUD engines.

4. Script DSL¶

A YAML-based format for declarative test scenarios. Aimed at game designers and QA who may not write Python.

# tests/e2e/scripts/test_character_creation.yaml
name: "Character creation - Human Warrior"
description: "Verify a new player can create a Human Warrior character"
tags: [smoke, character-creation]

setup:
  account:
    username: "testplayer"
    password: "testpass123"
    register: true

steps:
  - send: "1"
    expect: "Select your race"

  - send: "1"
    expect: "Select your class"

  - send: "1"
    expect: "Select your gender"

  - send: "3"
    expect: "Character Summary"
    expect_all:
      - "Human"
      - "Warrior"
      - "Neutral"

  - send: "Y"
    expect: "Welcome to the world"

  - send: "look"
    expect_any:
      - "You see"
      - "exits:"

  - send: "quit"
    expect: "Goodbye"

Script runner — A pytest plugin collects .yaml files from a configured directory and generates parametrized test cases. Each script becomes a test item in pytest output:

tests/e2e/test_scripts.py::test_character_creation_human_warrior PASSED
tests/e2e/test_scripts.py::test_basic_movement PASSED

5. Session Recording & Playback¶

Record mode — Wraps a MUDClient to capture all send/receive pairs with timestamps. Saves to a .teltest JSON file:

{
  "recorded_at": "2026-02-18T06:00:00Z",
  "server": "localhost:4000",
  "events": [
    {"t": 0.0, "type": "recv", "text": "Welcome to MAID\r\n"},
    {"t": 0.1, "type": "recv", "text": "Your choice: "},
    {"t": 1.2, "type": "send", "text": "R"},
    {"t": 1.4, "type": "recv", "text": "Choose a username: "},
    ...
  ]
}

Playback mode — Converts a recording into a YAML script or a Python test, replacing literal text with fuzzy patterns where appropriate.

CLI tool:

# Record a session
uv run teltest record localhost 4000 -o session.teltest

# Convert recording to YAML test
uv run teltest convert session.teltest -o test_flow.yaml

# Run scripts directly
uv run teltest run tests/e2e/scripts/ --server localhost:4000

API Design¶

MUDClient API¶

class MUDClient:
    """Async telnet client for MUD E2E testing.

    This client is MUD-engine agnostic. It speaks raw telnet and
    provides pattern-matching expectations on text output.

    Example:
        async with MUDClient("localhost", 4000) as client:
            await client.expect("Welcome")
            await client.send("connect user pass")
            await client.expect("Entering World")
    """

    def __init__(
        self,
        host: str = "localhost",
        port: int = 4000,
        *,
        timeout: float = 5.0,
        strip_ansi: bool = True,
        encoding: str = "utf-8",
        negotiate_gmcp: bool = False,
    ) -> None: ...

    # --- Connection lifecycle ---

    async def connect(self) -> None:
        """Open TCP connection and complete telnet negotiation."""

    async def disconnect(self) -> None:
        """Gracefully close the connection."""

    async def __aenter__(self) -> "MUDClient": ...
    async def __aexit__(self, *exc: object) -> None: ...

    @property
    def is_connected(self) -> bool: ...

    # --- Sending ---

    async def send(self, text: str) -> None:
        """Send a line of text (appends \\r\\n)."""

    async def send_raw(self, data: bytes) -> None:
        """Send raw bytes without modification."""

    # --- Expecting ---

    async def expect(
        self,
        pattern: str | re.Pattern[str],
        *,
        timeout: float | None = None,
        from_start: bool = False,
    ) -> Match:
        """Wait for output matching pattern. Returns Match with context.

        Searches output received since the last send()/expect() call (cursor-based).
        Set from_start=True to search the entire buffer history.

        Args:
            pattern: Substring or compiled regex to match against output.
            timeout: Override default timeout for this call.
            from_start: If True, search from buffer start ignoring cursor.

        Returns:
            Match object containing matched text, full line, and buffer context.

        Raises:
            ExpectTimeout: If pattern not found within timeout.
            ConnectionClosed: If the server closed the connection.
        """

    async def expect_prompt(
        self,
        prompt: str | re.Pattern[str] = "> ",
        *,
        timeout: float | None = None,
    ) -> str:
        """Wait for a prompt using multi-signal detection.

        Detection priority:
        1. Telnet GA (Go Ahead) signal
        2. Match against prompt pattern
        3. Quiescence (no new data for prompt_settle_time)
        """

    async def expect_line(
        self,
        text: str,
        *,
        timeout: float | None = None,
    ) -> str:
        """Wait for an exact complete line."""

    async def expect_sequence(
        self,
        patterns: list[str | re.Pattern[str]],
        *,
        timeout: float | None = None,
        per_step_timeout: float | None = None,
    ) -> list[Match]:
        """Wait for multiple patterns in order.

        Args:
            timeout: Total time allowed for all patterns (default).
            per_step_timeout: If set, each pattern gets this much time instead.
        """

    async def expect_any(
        self,
        patterns: list[str | re.Pattern[str]],
        *,
        timeout: float | None = None,
    ) -> tuple[int, Match]:
        """Wait for any of the patterns. Returns (index, match)."""

    async def expect_not(
        self,
        pattern: str | re.Pattern[str],
        *,
        until: str | re.Pattern[str],
        timeout: float | None = None,
    ) -> None:
        """Assert that pattern does NOT appear before the 'until' boundary.

        This avoids unconditional sleeps by waiting for a positive signal
        (e.g., the next prompt) and asserting the negative pattern didn't
        appear before it.

        Args:
            pattern: The text/regex that must NOT appear.
            until: A positive boundary pattern to wait for.
            timeout: Max time to wait for the 'until' boundary.

        Raises:
            UnexpectedMatch: If pattern appears before until.
            ExpectTimeout: If until is not found within timeout.
        """

    # --- Buffer access ---

    @property
    def output(self) -> str:
        """All received text since connection (ring buffer, max 10k lines)."""

    @property
    def recent(self) -> str:
        """Text received since last send() or expect() call (cursor-based)."""

    def clear_buffer(self) -> None:
        """Clear the output buffer and reset the read cursor."""

    @property
    def transcript(self) -> list[TranscriptEntry]:
        """Full session transcript with timestamps (ring buffer)."""

    # --- Connection health ---

    @property
    def reader_error(self) -> Exception | None:
        """Last exception from the background reader task, if any."""

    # --- GMCP (optional) ---

    async def send_gmcp(self, package: str, data: dict[str, Any]) -> None:
        """Send a GMCP message."""

    async def expect_gmcp(
        self,
        package: str,
        *,
        timeout: float | None = None,
    ) -> dict[str, Any]:
        """Wait for a GMCP message on the given package."""

    @property
    def gmcp_messages(self) -> list[tuple[str, dict[str, Any]]]:
        """All received GMCP messages."""


@dataclass(frozen=True)
class Match:
    """Result of a successful expect() call."""
    text: str           # The matched text
    line: str           # The full line containing the match
    pattern: str        # The pattern that matched
    elapsed: float      # Seconds waited
    buffer_before: str  # Buffer context before the match (last N lines)


@dataclass(frozen=True)
class TranscriptEntry:
    """Single entry in the session transcript."""
    timestamp: float
    direction: Literal["send", "recv", "gmcp_send", "gmcp_recv"]
    text: str


class ExpectTimeout(AssertionError):
    """Raised when expect() times out.

    Attributes:
        pattern: What we were looking for.
        timeout: How long we waited.
        buffer: What was actually received.
    """
    pattern: str
    timeout: float
    buffer: str


class ConnectionClosed(Exception):
    """Raised when the server closes the connection during expect().

    Attributes:
        buffer: Text received before the connection closed.
    """
    buffer: str


class UnexpectedMatch(AssertionError):
    """Raised by expect_not() when the forbidden pattern appears.

    Attributes:
        pattern: The pattern that should not have appeared.
        match: The Match object showing where it appeared.
    """
    pattern: str
    match: Match

Fixture API¶

@pytest.fixture(scope="session")
async def mud_server() -> AsyncGenerator[MUDServer, None]:
    """Start a MAID server for E2E testing.

    Starts GameEngine with in-memory storage on an ephemeral port.
    Waits for readiness probe before yielding.
    The server runs for the entire pytest session.

    Yields:
        MUDServer with .host, .port, .engine, .is_alive() attributes.
    """

@pytest.fixture
async def raw_client(mud_server: MUDServer) -> AsyncGenerator[MUDClient, None]:
    """A connected MUDClient with no login.

    Use this for testing the login/registration flow itself.
    Skips if server is not alive (crashed in a previous test).
    """

@pytest.fixture
async def mud_client(
    mud_server: MUDServer,
    registered_account: AccountInfo,
) -> AsyncGenerator[MUDClient, None]:
    """A MUDClient that is logged in and in the game world.

    Uses the configured AuthDriver to handle the login flow.
    Ready to send game commands immediately.
    Skips if server is not alive.
    """

@pytest.fixture(scope="session")
async def registered_account(mud_server: MUDServer) -> AccountInfo:
    """An account registered on the test server.

    Created once per session via the AuthDriver registration flow.
    Validates account still exists on each use.
    """

@dataclass
class MUDServer:
    """Handle to the running test server."""
    host: str
    port: int
    engine: GameEngine

    def is_alive(self) -> bool:
        """Check if the engine is still running."""

    async def wait_ticks(self, n: int = 1) -> None:
        """Block until N game ticks have been processed."""

    async def snapshot_world(self) -> WorldSnapshot:
        """Capture current world state for later restoration."""

    async def restore_world(self, snapshot: WorldSnapshot) -> None:
        """Restore world state from a snapshot."""

@dataclass
class AccountInfo:
    """Credentials for a test account."""
    username: str
    password: str
    email: str

Script DSL API¶

# Full schema for a TelTest script
name: string          # Required. Test name (becomes pytest node ID)
description: string   # Optional. Shown in verbose output
tags: [string]        # Optional. Maps to pytest markers
timeout: float        # Optional. Default timeout for all steps (default: 5.0)

setup:                # Optional. Pre-test configuration
  account:            # Account to use
    username: string
    password: string
    register: bool    # If true, register the account first
  character:          # Character to create/select
    name: string
    race: string      # e.g. "human", "elf"
    class: string     # e.g. "warrior", "mage"
    select: int       # Or select existing by index

steps:                # Required. Test actions
  - send: string                # Send a command
    expect: string              # Wait for substring (shorthand)
    expect_re: string           # Wait for regex
    expect_all: [string]        # Wait for ALL substrings (any order)
    expect_any: [string]        # Wait for ANY substring
    expect_not:                 # Assert text does NOT appear before boundary
      pattern: string
      until: string             # Required positive boundary
    expect_prompt: string       # Wait for specific prompt
    timeout: float              # Per-step timeout override
    delay: float                # Wait N seconds before this step

  - branch: string              # Send text and branch on response
    cases:
      "pattern A":              # If output contains this...
        - send: "..."           #   ...run these steps
      "pattern B":
        - send: "..."

  - note: string                # Comment in the transcript (no action)

  - group: string               # Named step group for reporting
    steps: [...]                 # Nested steps

teardown:              # Optional. Cleanup actions
  - send: "quit"

Telnet Protocol Handling¶

The MUDClient must handle telnet at the byte level. The design isolates protocol handling into a TelnetProtocol layer.

  TCP bytes in
       │
       ▼
┌──────────────┐
│ TelnetProtocol│──── IAC negotiation (auto-respond)
│              │──── MCCP2 decompression
│              │──── GMCP extraction
└──────┬───────┘
       │ clean text
       ▼
┌──────────────┐
│  ANSIStripper │──── Remove \x1b[...m sequences
└──────┬───────┘
       │ plain text
       ▼
┌──────────────┐
│ OutputBuffer  │──── Line splitting, prompt detection
└──────────────┘

IAC negotiation strategy:

Server sends	Client responds	Notes
`WILL GMCP (201)`	`DO GMCP` (if gmcp enabled) or `DONT GMCP`	GMCP opt-in
`WILL ECHO (1)`	`DO ECHO`	Password hiding
`WILL SGA (3)`	`DO SGA`	Standard
`DO TTYPE (24)`	`WILL TTYPE`, then send "TELTEST"	Identifies as TelTest client
`DO NAWS (31)`	`WILL NAWS`, then send 80×24	Standard terminal size
`DO MSDP (69)`	`WONT MSDP`	Not needed for testing
`DO MXP (91)`	`WONT MXP`	Not needed for testing
`WILL MCCP2 (86)`	`DONT MCCP2`	Compression adds complexity; decline by default
Anything else	`WONT` or `DONT`	Safe default: refuse unknown options

Prompt detection:

Prompts are text that arrives without a trailing \r\n. The buffer tracks whether the last chunk ended mid-line. When expect_prompt() is called, it matches against the current unterminated line in the buffer.

State Isolation & Determinism¶

This section addresses the #1 concern from adversarial review. All three reviewers (Opus, Codex, Gemini) identified shared mutable state as the most critical risk to test reliability.

The Problem¶

The mud_server fixture is session-scoped for performance (~2s startup cost). But this means all tests share one GameEngine and one World. Without isolation:

Test A creates character "Thorn" → Test B sees "Thorn" in the room
Test C drops a sword → Test D picks it up
Test ordering becomes load-bearing → flaky CI

Solution: World Snapshot/Restore¶

The mud_server fixture exposes snapshot_world() / restore_world() methods. A function-scoped autouse fixture captures state before each test and restores it after:

@pytest.fixture(autouse=True)
async def _isolate_world(mud_server: MUDServer) -> AsyncGenerator[None, None]:
    """Snapshot and restore world state around each test."""
    snapshot = await mud_server.snapshot_world()
    yield
    await mud_server.restore_world(snapshot)

What gets snapshot/restored:

Entity registry (entities, components, tags)
Room index and spatial data
In-memory document store collections
Command registry state

What is NOT restored (intentionally):

Account registry (session-scoped accounts persist)
Engine configuration
Network listener state

Tick Synchronization¶

The game engine runs a tick loop that processes systems asynchronously. Tests that depend on system processing (combat, NPC AI, item respawn) need to synchronize with the tick loop.

# Wait for the combat system to process the attack
await mud_client.send("attack goblin")
await mud_server.wait_ticks(2)  # Wait for 2 ticks to process
await mud_client.expect("You hit")

API:

class MUDServer:
    async def wait_ticks(self, n: int = 1) -> None:
        """Block until N ticks have been processed.

        Uses an asyncio.Event that the engine tick loop signals after each tick.
        """

    async def pause_ticks(self) -> None:
        """Pause the tick loop for deterministic step-through testing."""

    async def step_tick(self) -> None:
        """Process exactly one tick while paused."""

    async def resume_ticks(self) -> None:
        """Resume the tick loop after pausing."""

Design decision: Tick sync is exposed on MUDServer (not MUDClient) because it requires engine access. The client is intentionally ignorant of the server's internals to maintain the portability boundary.

Unique Test Data¶

To further reduce state coupling, fixtures generate unique data per test:

@pytest.fixture
def unique_name() -> str:
    """Generate a unique character/account name per test."""
    return f"test_{uuid4().hex[:8]}"

This prevents name collisions even when snapshot/restore is not perfect.

Test Patterns & Examples¶

@pytest.mark.e2e
async def test_register_login_create_character(raw_client: MUDClient) -> None:
    """Test complete new-player experience."""
    # Registration
    await raw_client.expect("Please select an option")
    await raw_client.send("R")
    await raw_client.expect("Choose a username")
    await raw_client.send("testplayer")
    await raw_client.expect("Email address")
    await raw_client.send("test@example.com")
    await raw_client.expect("Choose a password")
    await raw_client.send("secret123")
    await raw_client.expect("Confirm password")
    await raw_client.send("secret123")
    await raw_client.expect("Account created")

    # Character creation
    await raw_client.expect("Character Selection")
    await raw_client.send("C")
    await raw_client.expect("Enter character name")
    await raw_client.send("Thorn")
    await raw_client.expect("Select your race")
    await raw_client.send("1")  # Human
    await raw_client.expect("Select your class")
    await raw_client.send("1")  # Warrior
    await raw_client.expect("Select your gender")
    await raw_client.send("3")  # Neutral
    await raw_client.expect("Character Summary")
    await raw_client.send("Y")
    await raw_client.expect("Welcome to the world, Thorn!")

    # In world
    await raw_client.expect("Entering World")
    await raw_client.send("quit")
    await raw_client.expect("Goodbye")

Pattern 2: Gameplay Commands¶

@pytest.mark.e2e
async def test_look_and_move(mud_client: MUDClient) -> None:
    """Test basic navigation. mud_client is already logged in."""
    await mud_client.send("look")
    match = await mud_client.expect(re.compile(r"exits?:", re.IGNORECASE))
    assert match.text  # Room has exits

    await mud_client.send("inventory")
    await mud_client.expect_any(["You are carrying", "Your inventory is empty"])

Pattern 3: Multi-Client Interaction¶

@pytest.mark.e2e
async def test_player_sees_other_player(mud_server: MUDServer) -> None:
    """Two players in the same room can see each other."""
    async with logged_in_client(mud_server, "alice") as alice, \
               logged_in_client(mud_server, "bob") as bob:
        # Both start in the same room
        await alice.send("look")
        await alice.expect("Bob")

        await bob.send("say Hello!")
        await alice.expect("Bob says")

Pattern 4: YAML Script¶

name: "Basic navigation smoke test"
tags: [smoke, navigation]

setup:
  account:
    username: "navtest"
    password: "testpass123"
    register: true
  character:
    name: "Navigator"
    race: "human"
    class: "warrior"

steps:
  - send: "look"
    expect: "exits"

  - send: "inventory"
    expect_any: ["carrying", "empty"]

  - send: "quit"
    expect: "Goodbye"

Error Handling & Diagnostics¶

ExpectTimeout Failure Output¶

When expect() times out, the error message provides full context:

teltest.ExpectTimeout: Pattern not found within 5.0s

  Expected: "Welcome to the world"

  Received (last 20 lines):
  ─────────────────────────
  |  Select your gender:
  |
  |    [1] Male
  |    [2] Female
  |    [3] Neutral
  |
  |    [C]ancel - Return to selection
  |
  |  Select gender (1-3): █  ← (prompt, waiting for input)
  ─────────────────────────

  Transcript (last 5 interactions):
    [0.00s] RECV: "Character Summary"
    [0.01s] RECV: "Create this character?"
    [0.50s] SEND: "Y"
    [0.52s] RECV: "Character created successfully."
    [5.00s] TIMEOUT waiting for "Welcome to the world"

  Hint: The server may be waiting for additional input. Check if
        a prompt was expected before this step.

Connection Failure¶

teltest.ConnectionFailed: Could not connect to localhost:4000

  Attempted: 3 retries over 6.0s
  Last error: ConnectionRefusedError: [Errno 111] Connection refused

  Hint: Is the server running? Check the mud_server fixture log.

Diagnostics Features¶

--teltest-verbose — Print full transcript to stdout on failure
--teltest-record — Save .teltest recording for every test
--teltest-timeout-multiplier=N — Scale all timeouts (useful for slow CI)
Transcript on every assertion failure — via pytest plugin hook

Configuration¶

Environment Variables¶

Variable	Default	Description
`TELTEST_HOST`	`localhost`	Server host for standalone runs
`TELTEST_PORT`	`4000`	Server port for standalone runs
`TELTEST_TIMEOUT`	`5.0`	Default expect timeout (seconds)
`TELTEST_TIMEOUT_MULTIPLIER`	`1.0`	Multiply all timeouts (for CI)
`TELTEST_STRIP_ANSI`	`true`	Strip ANSI escape codes
`TELTEST_SCRIPT_DIR`	`tests/e2e/scripts`	YAML script search directory
`TELTEST_RECORDING_DIR`	`tests/e2e/recordings`	Recording output directory

pytest.ini / pyproject.toml¶

[tool.pytest.ini_options]
markers = [
    "e2e: End-to-end tests requiring a running server",
]

[tool.teltest]
timeout = 5.0
timeout_multiplier = 1.0
strip_ansi = true
script_dirs = ["tests/e2e/scripts"]
server_startup_timeout = 15.0

CI/CD Integration¶

Added post-review. Opus and Gemini flagged the missing CI section as a gap given success criterion SC-4.

GitHub Actions Workflow¶

name: E2E Tests
on: [push, pull_request]

jobs:
  e2e:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    env:
      MAID_DEBUG: "true"
      TELTEST_TIMEOUT_MULTIPLIER: "2.0"

    steps:
      - uses: actions/checkout@v4

      - name: Install uv
        uses: astral-sh/setup-uv@v4

      - name: Install dependencies
        run: uv sync

      - name: Run E2E tests
        run: uv run pytest tests/e2e/ -m e2e -x --timeout=120

      - name: Upload transcripts on failure
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: teltest-transcripts
          path: tests/e2e/recordings/

Key CI Considerations¶

Timeout multiplier: Set TELTEST_TIMEOUT_MULTIPLIER=2.0 in CI — shared runners have variable CPU availability
-x (fail fast): Stop on first failure since server crash makes all subsequent tests pointless
Separate from unit tests: Run pytest -m e2e separately from pytest -m "not e2e" so unit test failures don't block E2E and vice versa
Port isolation: Ephemeral ports via socket.bind(('', 0)) — no hardcoded ports
Zombie prevention: The mud_server fixture registers an atexit handler to terminate the engine. The GitHub Actions timeout-minutes provides a hard cap.
Docker networking: localhost works in CI containers. No special networking needed since server and tests run in the same process.
pytest-xdist support: For parallel execution, each xdist worker gets its own mud_server fixture (session-scoped per worker). Port allocation is already per-instance via ephemeral binding.

Running E2E Locally¶

# Run all E2E tests
uv run pytest tests/e2e/ -m e2e

# Run just YAML script tests
uv run pytest tests/e2e/test_scripts.py

# Run with verbose transcript output
uv run pytest tests/e2e/ -m e2e --teltest-verbose

# Validate YAML scripts without running
uv run teltest validate tests/e2e/scripts/

Package Structure¶

packages/teltest/
├── pyproject.toml              # Standalone package, no MAID deps
├── src/teltest/
│   ├── __init__.py             # Public API exports
│   ├── client.py               # MUDClient class
│   ├── protocol.py             # TelnetProtocol (IAC handling)
│   ├── ansi.py                 # ANSI escape code stripper
│   ├── buffer.py               # OutputBuffer with line/prompt tracking
│   ├── expect.py               # Expectation engine (Match, ExpectTimeout)
│   ├── gmcp.py                 # GMCP send/receive helpers
│   ├── recorder.py             # Session recording
│   ├── script/
│   │   ├── __init__.py
│   │   ├── schema.py           # YAML script schema (Pydantic models)
│   │   ├── runner.py           # Script execution engine
│   │   └── converter.py        # Recording → script converter
│   ├── cli.py                  # CLI entry point (record, convert, run)
│   └── pytest_plugin.py        # Fixtures, markers, CLI options
│
├── tests/                      # TelTest's own tests
│   ├── test_client.py
│   ├── test_protocol.py
│   ├── test_ansi.py
│   ├── test_buffer.py
│   ├── test_expect.py
│   ├── test_script_runner.py
│   └── conftest.py
│
└── README.md                   # Standalone package docs

# MAID-specific E2E tests live in the main repo:
tests/e2e/
├── conftest.py                 # mud_server, mud_client, registered_account fixtures
├── test_login.py               # Login/registration flow tests
├── test_character.py           # Character creation/selection tests
├── test_gameplay.py            # Core gameplay command tests
├── test_multiplayer.py         # Multi-client interaction tests
└── scripts/                    # YAML test scripts
    ├── smoke_login.yaml
    ├── smoke_navigation.yaml
    └── character_creation_matrix.yaml

Key decision: TelTest is a standalone package with zero MAID dependencies. It can be published independently and used with any MUD that speaks telnet. The MAID-specific fixtures and tests live in the main repo's tests/e2e/ directory.

Migration & Adoption¶

Phase 1: Core Client & Fixtures¶

Implement MUDClient, TelnetProtocol, OutputBuffer, ANSIStripper
Implement mud_server, raw_client, mud_client fixtures
Write first E2E test: full login→create→look→quit flow
Add pytest -m e2e marker to CI pipeline

Phase 2: Script DSL & Recording¶

Implement YAML script schema and runner
Implement session recorder and converter
Record reference sessions for smoke tests
Onboard QA team with script authoring docs

Phase 3: Advanced Features¶

GMCP assertion support
Multi-client test helpers
Parametrized character creation matrix (all race/class combos)
CI integration with timeout multiplier and parallel execution

Future Work¶

WebSocket client — A companion WSMUDClient for testing the /ws/game path with JSON message assertions
Fuzz testing — Random input generation to find crash-inducing sequences
Visual diff — Side-by-side comparison of expected vs. actual session transcripts
MUD compatibility matrix — Test TelTest against other MUD engines (Evennia, Ranvier, etc.) to validate portability
pytest-xdist support — Parallel E2E test execution with isolated server instances
AI-assisted test generation — Use LLM to convert natural-language scenarios ("test that a warrior can equip a sword") into test scripts

Appendix A: Telnet Protocol Reference¶

Byte	Name	Description
255	IAC	Interpret As Command
254	DONT	Refuse option
253	DO	Request option
252	WONT	Will not use option
251	WILL	Will use option
250	SB	Subnegotiation Begin
240	SE	Subnegotiation End
1	ECHO	Echo
3	SGA	Suppress Go-Ahead
24	TTYPE	Terminal Type
31	NAWS	Window Size
69	MSDP	MUD Server Data Protocol
70	MSSP	MUD Server Status Protocol
86	MCCP2	Compression v2
91	MXP	MUD eXtension Protocol
201	GMCP	Generic MUD Communication Protocol

Connect to server
       │
       ▼
"Please select an option"
  [L]ogin  [R]egister  [Q]uit
       │
       ├── L ──► "Username: " → "Password: " → Login validation
       │              │                              │
       │              │                    ┌─────────┴─────────┐
       │              │                  Success             Fail (3 max)
       │              │                    │                    │
       │              │                    ▼                    ▼
       │              │            Character Selection     Disconnect
       │              │
       ├── R ──► "Choose a username: " → "Email: " → "Password: " → "Confirm: "
       │                                                                │
       │                                                       Account Created
       │                                                                │
       │                                                       Character Selection
       │
       └── Q ──► Disconnect

Character Selection
       │
       ├── [N] Select existing ──► "Welcome back, {name}!" ──► Enter World
       │
       └── [C] Create new ──► Name → Race (1-10) → Class (1-10) → Gender (1-3)
                                    → Confirm (Y/N) → "Welcome to the world!" → Enter World

Enter World
       │
       ▼
"[Entering World...]"
{look output}
"> " (prompt)
       │
       ▼
   Main Loop: receive command → execute → send output → prompt
       │
       └── "quit" / "exit" ──► "Goodbye!" ──► Disconnect

Appendix C: Adversarial Review Summary¶

This design was reviewed by three adversarial agents (Claude Opus, GPT Codex, Gemini Pro) tasked with finding flaws, gaps, and unrealistic assumptions. Their feedback was synthesized and incorporated into the design above. This appendix records the key findings for traceability.

Unanimous Critical Findings (all 3 reviewers)¶

#	Finding	Resolution
1	Session-scoped server with shared mutable world state causes test bleed	Added State Isolation section with snapshot/restore
2	`expect()` scanning full buffer matches stale output from prior commands	Redesigned to cursor-based matching; `from_start` opt-in for full scan
3	Portability claim undermined by MAID-coupled fixtures	Added `AuthDriver` protocol abstraction in fixtures

Unanimous Major Findings (2+ reviewers)¶

#	Finding	Resolution
4	`expect_not()` is an unconditional sleep with no correctness guarantee	Redesigned with required `until` positive boundary parameter
5	No tick synchronization primitive for deterministic testing	Added `wait_ticks()`, `pause_ticks()`, `step_tick()` to `MUDServer`
6	Prompt detection via missing `\r\n` is a TCP fragmentation race	Added multi-signal detection (GA, regex, quiescence)
7	No server readiness check before tests begin	Added readiness probe retry loop in `mud_server` fixture
8	Reader task lifecycle gaps (crash, EOF, cancellation)	Specified reader contract: capture + re-raise, sentinel on disconnect
9	Unbounded transcript buffer	Changed to ring buffer (10k lines default)
10	Missing CI/CD section	Added CI/CD Integration section
11	`expect_sequence()` timeout semantics ambiguous	Added explicit `timeout` (total) vs `per_step_timeout` parameters
12	No server liveness check after crash	Added `is_alive()` with `pytest.skip` in fixtures

Notable Minor Findings¶

YAML DSL needs branch support for non-happy-path scenarios (added)
YAML script validation should happen before execution (added teltest validate CLI)
logged_in_client() helper used in examples but not defined in API (noted for implementation)
Session recording should optionally capture IAC negotiation bytes (noted for Phase 3)
Tags-to-markers mapping needs explicit rules (noted for implementation)

Reviewer Kudos (consensus strengths)¶

Standalone package separation (packages/teltest/ vs tests/e2e/)
ExpectTimeout error output design (buffer dump + transcript + hints)
Ephemeral port allocation for CI
ANSI stripping as default with opt-out
MCCP2 declined by default (pragmatic)
Phased migration plan (ship Phase 1 first)

TelTest — E2E Testing Framework for MUDs¶

Table of Contents¶

Executive Summary¶

Problem Statement & Current State¶

What Exists Today¶

The Problem¶

Success Criteria¶

Design Goals & Non-Goals¶

Goals¶

Non-Goals¶

Architecture Overview¶

Data Flow¶

Core Components¶

1. MUDClient — Async Telnet Driver¶

2. Expectation Engine¶

3. Test Fixtures & Server Lifecycle¶

4. Script DSL¶

5. Session Recording & Playback¶

API Design¶

MUDClient API¶

Fixture API¶

Script DSL API¶

Telnet Protocol Handling¶

State Isolation & Determinism¶

The Problem¶

Solution: World Snapshot/Restore¶

Tick Synchronization¶

Unique Test Data¶

Test Patterns & Examples¶

Pattern 1: Full Login Flow¶

Pattern 2: Gameplay Commands¶

Pattern 3: Multi-Client Interaction¶

Pattern 4: YAML Script¶

Error Handling & Diagnostics¶

ExpectTimeout Failure Output¶

Connection Failure¶

Diagnostics Features¶

Configuration¶

Environment Variables¶

pytest.ini / pyproject.toml¶

CI/CD Integration¶

GitHub Actions Workflow¶

Key CI Considerations¶

Running E2E Locally¶

Package Structure¶

Migration & Adoption¶

Phase 1: Core Client & Fixtures¶

Phase 2: Script DSL & Recording¶

Phase 3: Advanced Features¶

Future Work¶

Appendix A: Telnet Protocol Reference¶

Appendix B: MAID Login Flow Reference¶

Appendix C: Adversarial Review Summary¶

Unanimous Critical Findings (all 3 reviewers)¶

Unanimous Major Findings (2+ reviewers)¶

Notable Minor Findings¶

Reviewer Kudos (consensus strengths)¶