Skip to content

Technical Debt Tracking

This document tracks known technical debt in the MAID codebase. Technical debt represents areas where we've made pragmatic trade-offs that should be addressed in future refactoring efforts.

Overview

Technical debt is not inherently bad - it often represents reasonable decisions made under constraints. This document helps us:

  1. Track known debt so it doesn't get forgotten
  2. Prioritize which items to address first
  3. Plan migration paths for future refactoring
  4. Onboard new contributors by explaining why certain patterns exist

Active Technical Debt Items

TD-001: Module-Level Singletons in API Layer

Status: Partially migrated (Phase 1 complete - app.state-aware getters added) Priority: Medium Affected Files: - packages/maid-engine/src/maid_engine/api/auth.py - packages/maid-engine/src/maid_engine/api/admin/websocket.py - packages/maid-engine/src/maid_engine/api/admin/dashboard.py - packages/maid-engine/src/maid_engine/api/admin/entities.py - packages/maid-engine/src/maid_engine/api/v1/events_ws.py

Migration Progress (2026-02-04):

Phase 1 of the migration has been completed. All affected modules now have app.state-aware getter functions that: 1. Check request.app.state first for the dependency 2. Fall back to the module-level global if not found in app.state

This provides backwards compatibility while enabling new code to use proper dependency injection. The server initialization code (net/web/server.py) now stores auth components in app.state.

New app.state-aware functions added: - auth.py: _get_key_store_from_app(), _get_rate_limiter_from_app(), _get_auth_rate_limiter_from_app() - websocket.py: get_ws_manager_from_app(), get_event_broadcaster_from_app() - dashboard.py: get_event_buffer_from_app(), get_metrics_collector_from_app(), get_websocket_handler_from_app() - entities.py: get_component_registry_from_app() - events_ws.py: get_event_broadcaster_from_app()

Remaining work (Phase 2-4): - Phase 2: Update remaining server initialization to store all singletons in app.state - Phase 3: Update route functions to use Depends() with the new functions (partially done) - Phase 4: Update tests to use app.dependency_overrides instead of patching globals

Description:

Several API modules use module-level singletons (global variables with lazy initialization) to manage stateful objects like WebSocket managers, event broadcasters, metrics collectors, and component registries.

Current Pattern:

# Module-level global variable
_ws_manager: WebSocketManager | None = None

def get_websocket_manager() -> WebSocketManager:
    """Get or create the global WebSocket manager."""
    global _ws_manager
    if _ws_manager is None:
        _ws_manager = WebSocketManager()
    return _ws_manager

def reset_websocket_manager() -> None:
    """Reset the global WebSocket manager. Useful for testing."""
    global _ws_manager
    _ws_manager = None

Why This Exists:

The singleton pattern was adopted for simplicity during initial development. It provides: - Simple access to shared state from anywhere in the API layer - Lazy initialization (objects created only when needed) - Easy integration with FastAPI's dependency injection via Depends(get_websocket_manager)

Problems:

  1. Testing Difficulties
  2. Tests must manually reset state between runs using reset_*() functions
  3. State from one test can leak into another if fixtures aren't configured correctly
  4. Makes it harder to reason about test isolation

  5. Parallel Test Execution

  6. Tests that modify these globals cannot safely run in parallel
  7. This limits test suite performance on multi-core machines

  8. Implicit Dependencies

  9. The init pattern (e.g., init_auth()) requires careful setup/teardown
  10. Dependencies between modules are hidden rather than explicit
  11. Makes code harder to understand and maintain

  12. Single Instance Limitation

  13. Cannot easily create multiple instances for different scenarios
  14. Limits flexibility in testing edge cases

Current Workaround:

We provide reset functions and a unified reset_all_api_state() function for testing:

# In test fixtures
@pytest.fixture(autouse=True)
def reset_api_state():
    yield
    reset_all_api_state()  # Resets all API singletons

Individual reset functions are also available: - reset_auth_singletons() - Auth module (key store, rate limiter) - reset_admin_api_state() - All admin API singletons - reset_dashboard_state() - Dashboard metrics and event buffer - reset_websocket_state() - WebSocket manager and broadcaster - reset_component_registry() - Component registry - reset_events_ws_state() - v1 events WebSocket broadcaster - reset_audit_log_store() - Middleware audit log store

Recommended Migration Path:

Use FastAPI's built-in dependency injection more fully:

Phase 1: App State Storage

from contextlib import asynccontextmanager
from fastapi import FastAPI

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Initialize state during startup
    app.state.ws_manager = WebSocketManager()
    app.state.key_store = APIKeyStore(...)
    app.state.rate_limiter = RateLimiter(...)
    app.state.metrics_collector = MetricsCollector()
    yield
    # Cleanup during shutdown
    await app.state.ws_manager.shutdown()

Phase 2: Dependency Functions

from fastapi import Request, Depends

def get_ws_manager(request: Request) -> WebSocketManager:
    """Dependency that retrieves WebSocketManager from app state."""
    return request.app.state.ws_manager

def get_key_store(request: Request) -> APIKeyStore:
    """Dependency that retrieves APIKeyStore from app state."""
    return request.app.state.key_store

Phase 3: Use in Routes

from typing import Annotated

@router.websocket("/ws")
async def websocket_endpoint(
    websocket: WebSocket,
    manager: Annotated[WebSocketManager, Depends(get_ws_manager)],
):
    await manager.connect(websocket)

Phase 4: Testing with Overrides

def test_websocket_connection():
    app = create_test_app()
    mock_manager = Mock(spec=WebSocketManager)

    # Override dependency for testing
    app.dependency_overrides[get_ws_manager] = lambda: mock_manager

    client = TestClient(app)
    # Test with mock_manager

Benefits of Migration:

  1. Better Test Isolation - Each test can create its own app with fresh state
  2. Parallel Tests - No shared global state means safe parallel execution
  3. Explicit Dependencies - Dependencies are visible in function signatures
  4. Easy Mocking - Use app.dependency_overrides instead of patching globals
  5. Type Safety - Better IDE support and type checking

Migration Considerations:

  • Requires updating all route functions to use Depends()
  • Need to ensure app state is available in all contexts (background tasks, etc.)
  • Some edge cases around startup order may need careful handling
  • Should be done incrementally, one module at a time

Estimated Effort: Medium (2-3 days per module)

References: - FastAPI Lifespan Events: https://fastapi.tiangolo.com/advanced/events/ - FastAPI Testing with Overrides: https://fastapi.tiangolo.com/advanced/testing-dependencies/ - Python Dependency Injection Patterns: https://python-dependency-injector.ets-labs.org/


TD-002: Low Severity Issues from Implementation Review

Status: Backlog Priority: Low Source: Multi-agent implementation review (2026-02-04)

The following issues were identified during a comprehensive code review but are low priority and can be addressed as time permits:

Documentation (Opportunistic)

These minor documentation issues should be addressed opportunistically when working on related code:

  • Documentation mismatches - Some docs may be slightly out of date
  • API doc coverage gaps - Some endpoints missing OpenAPI descriptions
  • Env var naming mismatches - Minor inconsistencies in environment variable naming

Recommended Approach: Address when working on related code, during refactoring sprints, or when the issue causes user-visible problems.

Note: These are not blocking issues and the system functions correctly. They represent polish improvements.


Resolved Technical Debt

Items that have been addressed are moved here for historical reference.

TD-002-8: PO Parser Escape Sequences (Resolved 2026-02-04)

Comprehensive escape sequence handling was already implemented. Added tests verifying handling of \n, \t, \r, \\, \", \xNN, \uXXXX, \UXXXXXXXX, octal escapes, and C-style escapes (\a, \b, \f, \v).

TD-002-9: Locale Handling Edge Cases (Resolved 2026-02-04)

Added comprehensive tests for locale normalization edge cases including empty strings, whitespace, script codes (zh-Hans, zh-Hant), unknown locales, mixed case handling, and multiple separators.

TD-002-10: conversation_count Property Lock (Resolved 2026-02-04)

The property is already thread-safe: in asyncio, only one coroutine runs at a time (cooperative multitasking), and Python's len() on dict is atomic in CPython. Added documentation explaining this and tests verifying the behavior.

TD-002-11: Token Estimation Edge Cases (Resolved 2026-02-04)

Added comprehensive edge case tests for token estimation: Unicode text (Russian, Chinese, Arabic), emoji handling, mixed scripts, punctuation-only text, whitespace-only, single characters, newlines/tabs, very long text, and messages missing content keys.

TD-002-12: Callable Conditions in Persistence (Resolved 2026-02-04)

Not an issue: all conditions use serializable string enums (SpawnCondition, ResetCondition, etc.) rather than callables. Pydantic's model_dump_json() would fail loudly if non-serializable data were present. The architecture correctly separates serializable data models from runtime behavior.


Contributing

When adding new technical debt documentation:

  1. Assign an ID - Use the format TD-XXX with incrementing numbers
  2. Document why - Explain why the pattern exists, not just that it's bad
  3. Provide workarounds - Help current developers work with the code as-is
  4. Plan migration - Include a concrete migration path with code examples
  5. Estimate effort - Give a rough estimate to help with prioritization

When resolving technical debt:

  1. Update this document - Move the item to "Resolved" section
  2. Remove TODO comments - Clean up the source code markers
  3. Update tests - Remove any workarounds that are no longer needed
  4. Document in PR - Reference the tech debt item in your PR description

Last updated: 2025

See also: Exception Handling Policy | Style Guide