Production Readiness ==================== This page is the release and deployment runbook for a controlled single-study RePORT AI Portal instance. It does not replace study-team validation of real data, but it defines the technical controls required before operators treat a build as production-ready. Scope ----- RePORT AI Portal is local-first and single-study-focused. The default supported production posture is: * one study selected by ``STUDY_NAME``; * one reviewed snapshot baseline under ``data/snapshots/{STUDY_NAME}/``; * the live assistant reading only ``output/{STUDY_NAME}/trio_bundle/`` and ``output/{STUDY_NAME}/agent/``; * no public unauthenticated access. Production access for more than one user must sit behind an explicit network and authentication boundary. Streamlit is the application server, not the public security perimeter. Release Gate ------------ Run the full release gate from a clean checkout: .. code-block:: bash make release-check This expands to: .. code-block:: text verify → typecheck → test-all → docs-ci → security The gate must pass before tagging or deploying. ``docs-ci`` includes the warnings-as-errors Sphinx build and external link check. ``security`` runs the dependency vulnerability audit. A dependency that cannot be audited because it is the local project package is acceptable; third-party vulnerability findings are not. The release workflow must tag immutable releases as ``vX.Y.Z`` and attach build artifacts to the GitHub Release. Deployments should pin to a tag, not to a moving branch. Deployment Boundary ------------------- For local workstation use, run: .. code-block:: bash make chat The checked-in Streamlit configuration binds to ``127.0.0.1:8501`` with CORS and XSRF protection enabled. Do not disable these settings to make a proxy work; fix the proxy configuration instead. For shared use, place the app behind a reverse proxy that provides: * HTTPS/TLS termination; * authentication before traffic reaches Streamlit; * an allow-list or VPN/private-network boundary where appropriate; * WebSocket proxying for Streamlit sessions; * security headers at the proxy layer. Production services must set ``REPORT_AI_AUTH_MODE=proxy`` and a long random ``REPORT_AI_PROXY_SHARED_SECRET`` through the deployment secret store. The proxy must set both ``X-Forwarded-User`` and ``X-Report-AI-Proxy-Secret``. Missing or mismatched values stop the app before the PHI-capable UI renders. Production services must also set ``REPORT_AI_PRODUCTION=1`` and ``REPORT_AI_REQUIRE_PHI_LOG_REDACTOR=1``. With those flags, missing or unreadable PHI redaction keys are startup failures, not warnings. Local developer runs may still warn and continue before the first study load provisions a key. Set ``STUDY_NAME`` explicitly for deployments. Study auto-detection falls back to the default name when ``data/raw/`` is absent or the deployment uses only reviewed snapshots / scrubbed output. If an environment must fail when auto-detection cannot find raw study input, set ``REPORT_AI_STRICT_STUDY_DETECTION=1``. The Nginx template applies conservative per-client request throttles. Keep the app-layer chat turn ceiling enabled as a second guard: ``CHAT_RATE_LIMIT_MAX_TURNS`` requests per ``CHAT_RATE_LIMIT_WINDOW_SECONDS``. The repository includes starting templates: * ``deploy/nginx/report-ai-portal.conf.example`` — Nginx reverse proxy with OAuth2 Proxy hook, TLS redirect, WebSocket forwarding, and security headers. * ``deploy/nginx/report-ai-portal-proxy-secret.conf.example`` — root-only Nginx snippet containing the proxy shared-secret header. Keep this outside the broadly-readable site config and make it match ``REPORT_AI_PROXY_SHARED_SECRET``. * ``deploy/systemd/report-ai-portal.service.example`` — Linux service unit with direct virtualenv execution, narrow writable paths, and process hardening. Build the virtualenv during deployment, then start the service; do not let systemd invoke ``uv run`` as the long-lived runtime. * ``deploy/systemd/report-ai-portal-healthcheck.*.example`` — timer-driven healthcheck that restarts the service when ``/_stcore/health`` fails. Review the examples before use. Replace hostnames, certificate paths, service users, writable paths, OAuth configuration, and CSP reporting endpoints for the deployed environment. Before enabling the systemd unit, install dependencies into the checked-out virtualenv: .. code-block:: bash cd /opt/report-ai-portal uv sync --frozen --group web --group ai_assistant --group llm Security Headers ---------------- The proxy should set browser security headers following the `OWASP Secure Headers Project `_. At minimum: * ``Strict-Transport-Security`` on HTTPS deployments; * ``X-Content-Type-Options: nosniff``; * ``X-Frame-Options: DENY`` or an equivalent ``frame-ancestors`` CSP; * ``Referrer-Policy``; * ``Permissions-Policy`` denying unused browser capabilities; * a tested ``Content-Security-Policy``. The Nginx template ships an enforcing CSP baseline with Streamlit's required inline runtime allowances. Exercise the full wizard and chat workflow after proxy changes, inspect browser console and CSP reports, and add only exact origins required by the deployed environment. Monitoring ---------- Set ``LOG_FORMAT=json`` and ``LOG_DIR`` for deployed services. Monitor: * service start, stop, restart, and non-zero exit events; * ``PHI log redactor NOT installed`` warnings; * pipeline failures and preserved ``tmp/{STUDY_NAME}/`` staging trees; * snapshot restore failures; * hosted LLM API errors and provider fallback events; * dependency-audit failures in CI; * unexpected reads or writes rejected by zone guards. Alerting should page an operator for PHI-control failures, repeated pipeline failures, or any public exposure of the Streamlit port without the proxy/auth boundary. Backups and Restore ------------------- Back up only intentional durable state: * ``data/raw/{STUDY_NAME}/`` if the study team permits raw-data backup; * ``data/snapshots/{STUDY_NAME}/`` after human review; * ``output/{STUDY_NAME}/audit/`` for lineage and compliance evidence; * ``output/{STUDY_NAME}/agent/conversations/`` if conversation retention is approved; * the sidecar PHI key at ``config.PHI_KEY_PATH``. Do not back up ``.venv/``, ``tmp/``, ``.pytest_cache/``, ``.mypy_cache/``, ``.ruff_cache/``, or generated docs build output. Backups that contain raw data, snapshots, audit files, conversations, or the PHI key must be encrypted at rest and access-controlled. The PHI key must be backed up separately from raw data when policy requires separation of duties. Restore drills are mandatory before production use: 1. restore the PHI key; 2. restore ``data/snapshots/{STUDY_NAME}/``; 3. run ``make restore-study``; 4. launch ``make chat``; 5. confirm the assistant reads the restored ``trio_bundle/`` and not ``data/snapshots/`` directly. Run the non-destructive automated drill before hand-off: .. code-block:: bash make restore-drill Secret and Key Rotation ----------------------- Hosted LLM API keys are provider secrets. Rotate them through the provider console and update only the deployment secret store or local session input. Never commit them. The PHI HMAC key is different: it defines stable pseudonyms and shifted dates. Rotating it changes derived identifiers. A PHI-key rotation requires: 1. stop the app and pipeline jobs; 2. archive the old key according to study policy; 3. create the replacement key through the developer/operator path; 4. run a full re-ingestion from raw data; 5. rebuild and review the snapshot baseline; 6. document the rotation in the study operations log. Do not mix artifacts generated with different PHI keys in one reviewed snapshot. Incident Response ----------------- For suspected PHI exposure, leaked credentials, public app exposure, or incorrect bundle publication: 1. stop the service or block access at the proxy; 2. preserve logs, audit files, and the current ``output/{STUDY_NAME}/`` tree; 3. rotate hosted API keys if they may have been exposed; 4. quarantine the affected ``trio_bundle/`` and snapshot baseline; 5. identify whether raw, staging, audit, snapshot, or agent zones were exposed; 6. notify the PI/privacy owner under the study's IRB/IEC incident process; 7. rebuild from raw data only after the root cause is fixed and reviewed; 8. record the corrective action before restoring service. Operational Non-Negotiables --------------------------- * Never expose Streamlit directly to the internet. * Never disable CORS or XSRF protection in production. * Never let the LLM read ``data/raw/``, ``tmp/``, ``audit/``, or ``data/snapshots/`` directly. * Never treat a snapshot as valid until the study team has reviewed it. * Never rotate the PHI key without full re-ingestion. * Never deploy a build that fails ``make release-check``. External References ------------------- * `Streamlit configuration `_ * `OWASP Secure Headers Project `_