Skip to content

ADR-006: PBKDF2-HMAC-SHA256 for API Key Hashing

Status

Accepted

Date

2024-03-15

Context

MAID's admin API (packages/maid-engine/src/maid_engine/api/) uses API keys for authentication. API keys are generated by the APIKeyManager in packages/maid-engine/src/maid_engine/api/auth.py and must be stored securely. If the key storage (file or document store) is compromised, an attacker should not be able to recover the original API keys.

The initial implementation specification called for plain SHA-256 hashing of API keys. While API keys have higher entropy than user passwords (they are randomly generated), plain SHA-256 has no computational cost factor, meaning a compromised hash can be brute-forced at GPU speeds (billions of hashes per second).

Decision

Use PBKDF2-HMAC-SHA256 with per-key random salts for API key hashing. The implementation in APIKeyManager (in packages/maid-engine/src/maid_engine/api/auth.py) works as follows:

  • Key generation: A raw API key is generated with secrets.token_urlsafe(). A 32-byte random salt is generated with secrets.token_bytes().
  • Hashing: The key is hashed using hashlib.pbkdf2_hmac("sha256", ...) with the salt and a configurable iteration count.
  • Storage: The APIKey dataclass stores key_hash (hex-encoded PBKDF2 output), key_salt (hex-encoded random salt), and pbkdf2_iterations (the iteration count used at creation time).
  • Validation: validate_key() retrieves the stored salt and iteration count for the candidate key, recomputes the PBKDF2 hash, and uses constant-time comparison via hmac.compare_digest().
  • Configuration: Iteration count (MAID_SECURITY__PBKDF2_ITERATIONS, default 600,000) and salt length (MAID_SECURITY__API_KEY_SALT_LENGTH, default 32 bytes) are configurable via SecuritySettings in packages/maid-engine/src/maid_engine/config/settings.py.
  • Forward compatibility: The iteration count is stored per-key, so keys created with different iteration counts (e.g., after a settings change) can still be validated. Legacy keys without key_salt fall back to plain SHA-256 for backward compatibility.

Lookup performance is maintained through a prefix index: API keys include a cleartext prefix (16 characters) used for O(1) lookup before the expensive PBKDF2 verification step.

Consequences

Positive

  • Brute-force resistance: At 600,000 PBKDF2 iterations, even a fast GPU can only compute approximately 10,000-50,000 hashes per second (compared to billions for plain SHA-256). This makes offline brute-force attacks impractical even for shorter key spaces.
  • Per-key salts: Each key has a unique random salt, preventing rainbow table attacks and ensuring identical keys produce different hashes.
  • Industry standard: PBKDF2 is recommended by OWASP and NIST (SP 800-132) for key derivation. The 600,000 iteration default meets OWASP's 2023 minimum recommendation.
  • Standard library: hashlib.pbkdf2_hmac is available in Python's standard library with no additional dependencies.

Negative

  • Validation latency: Each key validation requires computing a full PBKDF2 derivation (approximately 50-200ms depending on hardware and iteration count). This is mitigated by the prefix index, which avoids PBKDF2 computation for non-matching keys.
  • Not the strongest option: Argon2 (memory-hard) provides better resistance against GPU/ASIC attacks than PBKDF2. However, Argon2 requires the argon2-cffi package, which has C extension dependencies.
  • Migration complexity: The backward compatibility code for legacy SHA-256 keys adds conditional logic to the validation path. Keys without key_salt use the weaker plain SHA-256 path.

Alternatives Considered

Plain SHA-256

The original specification's approach. Rejected because SHA-256 has no cost factor. A leaked database of SHA-256 hashed API keys could be brute-forced at billions of attempts per second on commodity GPUs.

bcrypt

A well-established password hashing algorithm with built-in salt and cost factor. Rejected because bcrypt has a 72-byte input limit (API keys can be longer), and bcrypt is a C extension dependency that complicates installation on some platforms.

Argon2

The Password Hashing Competition winner, designed to be both CPU-hard and memory-hard. Considered the strongest option. Rejected because the argon2-cffi package requires C compilation, and MAID aims to be installable with pip install on pure-Python environments. If this requirement changes, migrating to Argon2 would be straightforward since the per-key iteration count storage pattern already supports algorithm migration.