ADR-002: Async/Await Throughout the Codebase¶
Status¶
Accepted
Date¶
2024-01-15
Context¶
A MUD engine is inherently concurrent. Hundreds of players connect simultaneously via Telnet and WebSocket, each issuing commands, receiving room descriptions, and interacting with AI-powered NPCs. The engine must also handle a tick-based game loop, periodic AI API calls (with high latency), database I/O, and external bridge connections (IRC, Discord, RSS).
Traditional MUD engines in Python either use threads (one per connection) or blocking I/O with a homegrown event loop. Both approaches have significant drawbacks: threads introduce synchronization complexity and memory overhead, while custom event loops are hard to maintain and lack ecosystem support.
Decision¶
Use Python's asyncio as the single concurrency model for all I/O operations
throughout the entire codebase. Specifically:
- The game tick loop in
GameEngine._tick_loop()(packages/maid-engine/src/maid_engine/core/engine.py) is an async loop usingasyncio.sleep()for timing. - All ECS
System.update(delta)methods areasync. - All
ContentPacklifecycle hooks (on_load,on_unload) are async. - Network layers (Telnet in
packages/maid-engine/src/maid_engine/net/telnet/, WebSocket inpackages/maid-engine/src/maid_engine/net/web/) useasyncio.StreamReaderandasyncio.StreamWriteror FastAPI async endpoints. - The
DocumentStoreAPI andAccountManagerauthentication methods are all async. - External bridges like
IRCBridge(packages/maid-engine/src/maid_engine/bridges/irc_bridge.py) useasyncio.open_connection()for TCP andasyncio.Eventfor coordination. - The
ConversationManagerfor AI dialogue has all-async methods to support async DocumentStore persistence. - Hot reload operations use
asyncio.Eventfor tick-loop pause/resume handshaking (_hot_reload_pause,_hot_reload_paused_ackinGameEngine).
Consequences¶
Positive¶
- Single-threaded concurrency: No locks needed for most game state mutations.
The
World,EntityManager, andSystemManagercan be accessed without synchronization because only one coroutine runs at a time betweenawaitpoints. - Natural I/O multiplexing: Hundreds of Telnet/WebSocket connections are handled by a single thread. AI API calls to Anthropic/OpenAI/Ollama providers naturally yield during network waits.
- Ecosystem compatibility: FastAPI (for the admin REST API), aiohttp (for AI provider HTTP clients), and asyncio streams (for Telnet/IRC) all share the same event loop without adapter layers.
- Tick-loop integration:
asyncio.sleep()in the tick loop naturally yields to connection handlers and background tasks between ticks.
Negative¶
- Viral async: Every caller must
awaitasync functions. A single synchronous call in the chain blocks the entire event loop. This is why even theEventBusprovidesemit_syncfor fire-and-forget events within the tick. - Testing complexity: Tests require
pytest-asyncioand@pytest.mark.asynciodecorators. Mocking async methods requiresAsyncMock. Test setup often needs an event loop fixture. - CPU-bound work: Long-running computations (e.g., A* pathfinding in
GridManager, procedural terrain generation inSimplexNoise) can block the event loop. These must be kept fast or offloaded to thread executors. - Stack traces: Async stack traces are harder to read than synchronous ones, especially when tracing through the tick loop, system manager, and individual system updates.
Alternatives Considered¶
Threading (one thread per connection)¶
Rejected due to GIL contention, memory overhead of hundreds of threads, and the
need for locks around all shared game state (World, EntityManager, rooms).
Gevent / Green Threads¶
Gevent patches standard library I/O to be cooperative. While this avoids explicit async/await syntax, it introduces implicit yielding that is harder to reason about, and it is incompatible with many modern Python libraries (FastAPI, Pydantic v2).
Synchronous with Select-Based Event Loop¶
Writing a custom select()-based loop (as many classic MUDs do) was rejected
because it would preclude use of FastAPI for the admin API, modern HTTP clients
for AI providers, and the broader asyncio ecosystem.