Production Backlog
This page stores production hardening items that are useful but not required for the current release gate. Revisit it during release planning.
Supply Chain
Add Dependabot or Renovate for Python dependencies and GitHub Actions.
Emit a CycloneDX or SPDX SBOM on each GitHub Release.
Decide whether
pip-auditfindings may ever use an allow-list, or keep the current fail-closed policy and document it explicitly.Add CODEOWNERS for security-sensitive surfaces under
scripts/security/,deploy/, and IRB/auditor documentation.Add issue and pull request templates, including a private security triage path for suspected PHI exposure.
Runtime Resilience
Add hosted-LLM retry/backoff and circuit-breaker behavior for OpenAI, Anthropic, Google, and NVIDIA provider calls.
Add per-session token and estimated-cost ceilings with operator alerts for runaway agent loops.
Add provider-side spend alarms and document who owns them.
Add load and latency-regression checks with a concrete p95 target.
Replace the local filesystem pipeline lock with a distributed lock before running multiple portal instances against shared output storage.
Security Headers
Wire a reachable CSP violation report sink for deployed Nginx environments.
Tighten Streamlit CSP allowances as framework runtime requirements permit, especially inline script and eval allowances.
Observability
Ship an example remote log/error sink configuration such as Loki/Promtail or Sentry, with the exact event names operators should alert on.
Roll up telemetry JSONL into daily usage, token, cost, and anomaly summaries.
Surface PHI redactor internal exceptions as a metric without leaking raw event text.
Data Retention
Decide whether production output bundles, audit files, and conversation logs require application-layer encryption in addition to encrypted host volumes.
Add a conversation retention command with max-age and max-count controls.
Decide whether reviewed snapshots should remain single-slot or rotate by timestamp before overwrite.
Deployment Packaging
Add an OCI image for immutable deployment where operators cannot rely on
uv syncagainst live package indexes.Add a data-flow diagram in
docs/sphinx/irb_auditor/showing browser, proxy, Streamlit, local files, and hosted LLM provider egress.Define SLO, uptime, and error-budget targets for deployments that need a support contract.