Production Readiness

This page is the release and deployment runbook for a controlled single-study RePORT AI Portal instance. It does not replace study-team validation of real data, but it defines the technical controls required before operators treat a build as production-ready.

Scope

RePORT AI Portal is local-first and single-study-focused. The default supported production posture is:

one study selected by STUDY_NAME;
one reviewed snapshot baseline under data/snapshots/{STUDY_NAME}/;
the live assistant reading only output/{STUDY_NAME}/trio_bundle/ and output/{STUDY_NAME}/agent/;
no public unauthenticated access.

Production access for more than one user must sit behind an explicit network and authentication boundary. Streamlit is the application server, not the public security perimeter.

Release Gate

Run the full release gate from a clean checkout:

make release-check

This expands to:

verify → typecheck → test-all → docs-ci → security

The gate must pass before tagging or deploying. docs-ci includes the warnings-as-errors Sphinx build and external link check. security runs the dependency vulnerability audit. A dependency that cannot be audited because it is the local project package is acceptable; third-party vulnerability findings are not.

The release workflow must tag immutable releases as vX.Y.Z and attach build artifacts to the GitHub Release. Deployments should pin to a tag, not to a moving branch.

Deployment Boundary

For local workstation use, run:

make chat

The checked-in Streamlit configuration binds to 127.0.0.1:8501 with CORS and XSRF protection enabled. Do not disable these settings to make a proxy work; fix the proxy configuration instead.

For shared use, place the app behind a reverse proxy that provides:

HTTPS/TLS termination;
authentication before traffic reaches Streamlit;
an allow-list or VPN/private-network boundary where appropriate;
WebSocket proxying for Streamlit sessions;
security headers at the proxy layer.

Production services must set REPORT_AI_AUTH_MODE=proxy and a long random REPORT_AI_PROXY_SHARED_SECRET through the deployment secret store. The proxy must set both X-Forwarded-User and X-Report-AI-Proxy-Secret. Missing or mismatched values stop the app before the PHI-capable UI renders.

Production services must also set REPORT_AI_PRODUCTION=1 and REPORT_AI_REQUIRE_PHI_LOG_REDACTOR=1. With those flags, missing or unreadable PHI redaction keys are startup failures, not warnings. Local developer runs may still warn and continue before the first study load provisions a key.

Set STUDY_NAME explicitly for deployments. Study auto-detection falls back to the default name when data/raw/ is absent or the deployment uses only reviewed snapshots / scrubbed output. If an environment must fail when auto-detection cannot find raw study input, set REPORT_AI_STRICT_STUDY_DETECTION=1.

The Nginx template applies conservative per-client request throttles. Keep the app-layer chat turn ceiling enabled as a second guard: CHAT_RATE_LIMIT_MAX_TURNS requests per CHAT_RATE_LIMIT_WINDOW_SECONDS.

The repository includes starting templates:

deploy/nginx/report-ai-portal.conf.example — Nginx reverse proxy with OAuth2 Proxy hook, TLS redirect, WebSocket forwarding, and security headers.
deploy/nginx/report-ai-portal-proxy-secret.conf.example — root-only Nginx snippet containing the proxy shared-secret header. Keep this outside the broadly-readable site config and make it match REPORT_AI_PROXY_SHARED_SECRET.
deploy/systemd/report-ai-portal.service.example — Linux service unit with direct virtualenv execution, narrow writable paths, and process hardening. Build the virtualenv during deployment, then start the service; do not let systemd invoke uv run as the long-lived runtime.
deploy/systemd/report-ai-portal-healthcheck.*.example — timer-driven healthcheck that restarts the service when /_stcore/health fails.

Review the examples before use. Replace hostnames, certificate paths, service users, writable paths, OAuth configuration, and CSP reporting endpoints for the deployed environment.

Before enabling the systemd unit, install dependencies into the checked-out virtualenv:

cd /opt/report-ai-portal
uv sync --frozen --group web --group ai_assistant --group llm

Security Headers

The proxy should set browser security headers following the OWASP Secure Headers Project. At minimum:

Strict-Transport-Security on HTTPS deployments;
X-Content-Type-Options: nosniff;
X-Frame-Options: DENY or an equivalent frame-ancestors CSP;
Referrer-Policy;
Permissions-Policy denying unused browser capabilities;
a tested Content-Security-Policy.

The Nginx template ships an enforcing CSP baseline with Streamlit’s required inline runtime allowances. Exercise the full wizard and chat workflow after proxy changes, inspect browser console and CSP reports, and add only exact origins required by the deployed environment.

Monitoring

Set LOG_FORMAT=json and LOG_DIR for deployed services. Monitor:

service start, stop, restart, and non-zero exit events;
PHI log redactor NOT installed warnings;
pipeline failures and preserved tmp/{STUDY_NAME}/ staging trees;
snapshot restore failures;
hosted LLM API errors and provider fallback events;
dependency-audit failures in CI;
unexpected reads or writes rejected by zone guards.

Alerting should page an operator for PHI-control failures, repeated pipeline failures, or any public exposure of the Streamlit port without the proxy/auth boundary.

Backups and Restore

Back up only intentional durable state:

data/raw/{STUDY_NAME}/ if the study team permits raw-data backup;
data/snapshots/{STUDY_NAME}/ after human review;
output/{STUDY_NAME}/audit/ for lineage and compliance evidence;
output/{STUDY_NAME}/agent/conversations/ if conversation retention is approved;
the sidecar PHI key at config.PHI_KEY_PATH.

Do not back up .venv/, tmp/, .pytest_cache/, .mypy_cache/, .ruff_cache/, or generated docs build output.

Backups that contain raw data, snapshots, audit files, conversations, or the PHI key must be encrypted at rest and access-controlled. The PHI key must be backed up separately from raw data when policy requires separation of duties.

Restore drills are mandatory before production use:

restore the PHI key;
restore data/snapshots/{STUDY_NAME}/;
run make restore-study;
launch make chat;
confirm the assistant reads the restored trio_bundle/ and not data/snapshots/ directly.

Run the non-destructive automated drill before hand-off:

make restore-drill

Secret and Key Rotation

Hosted LLM API keys are provider secrets. Rotate them through the provider console and update only the deployment secret store or local session input. Never commit them.

The PHI HMAC key is different: it defines stable pseudonyms and shifted dates. Rotating it changes derived identifiers. A PHI-key rotation requires:

stop the app and pipeline jobs;
archive the old key according to study policy;
create the replacement key through the developer/operator path;
run a full re-ingestion from raw data;
rebuild and review the snapshot baseline;
document the rotation in the study operations log.

Do not mix artifacts generated with different PHI keys in one reviewed snapshot.

Incident Response

For suspected PHI exposure, leaked credentials, public app exposure, or incorrect bundle publication:

stop the service or block access at the proxy;
preserve logs, audit files, and the current output/{STUDY_NAME}/ tree;
rotate hosted API keys if they may have been exposed;
quarantine the affected trio_bundle/ and snapshot baseline;
identify whether raw, staging, audit, snapshot, or agent zones were exposed;
notify the PI/privacy owner under the study’s IRB/IEC incident process;
rebuild from raw data only after the root cause is fixed and reviewed;
record the corrective action before restoring service.

Operational Non-Negotiables

Never expose Streamlit directly to the internet.
Never disable CORS or XSRF protection in production.
Never let the LLM read data/raw/, tmp/, audit/, or data/snapshots/ directly.
Never treat a snapshot as valid until the study team has reviewed it.
Never rotate the PHI key without full re-ingestion.
Never deploy a build that fails make release-check.