Tech Stack

Every runtime and development dependency, grouped by role, with one paragraph each on what it is, why it was chosen, and how the project uses it. Pinned versions and rationale live in pyproject.toml.

Runtime — language and tooling

Python 3.11+

What. The host language. Why. Required for concurrent.futures clean shutdown semantics, asyncio.timeout, and the X | Y union syntax used throughout the codebase. How. pyproject.toml pins requires-python = ">=3.11"; CI matrix runs against 3.11 / 3.12 / 3.13.

uv

What. A Rust-based pip / poetry / pipx replacement. Why. 10-100× faster lockfile resolution; reproducible environments. How. Project-wide convention: uv sync --all-groups to install, uv run to invoke. The Makefile assumes uv; CI installs it via the official astral-sh/setup-uv action, pinned to an immutable commit SHA.

Ruff

What. A fast Rust-based Python linter + formatter. Why. Single tool that replaces flake8, isort, pyupgrade, and includes S (flake8-bandit) security rules. How. Configuration at pyproject.toml:215-241 (the [tool.ruff.lint] section). S101 (assert) is per-file-ignored for tests/ since pytest idiom; S603 (subprocess) is whitelisted at our hardened subprocess.run callsites with # noqa: S603.

mypy

What. Static type checker. Why. Catches a class of LLM-flow bugs such as nullable provider names reaching SDK constructors. How. pyproject.toml configures ignore_missing_imports = true so optional deps don’t block; custom stubs live in typings/ for google.genai and anthropic.

Pytest

What. Test runner. Why. Mature ecosystem, conftest.py fixtures, deterministic markers. How. Testing covers the test-file conventions. make test runs the deterministic subset that excludes the AI Assistant construction smokes; make test-all runs the full suite.

Runtime — pipeline

pandas

What. Tabular dataframe library. Why. Excel reading, JSONL output, dataset cleanup, k-anonymity equivalence-class lookups all ride on pandas. How. scripts.extraction.dataset_pipeline reads the raw Excel into a DataFrame; per-row records are serialised to JSONL with the provenance dict.

openpyxl

What. Excel .xlsx reader/writer. Why. pandas’s default .xlsx engine. How. Used implicitly by pd.read_excel for the dictionary + dataset legs.

pypdf

What. Lightweight PDF text extractor. Why. Powers the legacy raw-PDF API path. How. Used in scripts.extraction.extract_pdf_data when the operator opts into the gated raw-PDF API path with the two-part attestation.

pdfplumber

What. Layout-aware PDF extractor. Why. Per-character bounding boxes give better structure recovery than pypdf for complex multi-section CRFs. How. pdfplumber is the always-on code path inside the two-way PDF orchestrator (scripts.extraction.pdf_pipeline). Extracted text is PHI-redacted before any LLM call; the LLM response is merged with the code candidate via _merge.

PyYAML

What. YAML parser. Why. The PHI scrub catalog (scripts/security/phi_scrub.yaml) and the study-knowledge overlay (config/study_knowledge.yaml) ship as YAML so domain experts can edit without touching code. How. Loaded once at import time; cached.

Runtime — agent

LangChain + LangGraph

What. LLM-agent framework. Why. init_chat_model gives provider-agnostic construction (Anthropic / OpenAI / Google / Ollama / NVIDIA all behind one API); LangGraph’s ReAct prebuilt is the agent topology. How. scripts.ai_assistant.agent_graph is the only module that constructs an LLM client; every client takes api_key= as an explicit kwarg sourced from the in-memory KeyStore — no os.environ lookup at construction time.

LangChain provider packages

What. Per-provider LangChain integrations: langchain-anthropic, langchain-openai, langchain-google-genai, langchain-ollama, langchain-nvidia-ai-endpoints. Why. Each provider has its own client + auth + retry semantics; the LangChain wrappers normalise them. How. All five are declared runtime dependencies; init_chat_model("anthropic:claude-...") dispatches to the right wrapper based on the provider prefix.

anthropic, google-genai (raw SDKs)

What. Provider raw SDKs. Why. The PDF orchestrator’s _extract_via_llm calls the raw SDK directly because the orchestrator’s contract is a single non-streaming JSON response with PHI-redacted text — heavier LangChain machinery is overkill here. How. scripts.extraction.pdf_pipeline._extract_via_llm() dispatches on provider{anthropic, google, gemini, google-genai}.

Streamlit ≥ 1.38, < 2.0

What. Web UI framework. Why. Fast prototyping; built-in session_state and chat widgets. The chat UI intentionally has no file-upload surface; source data enters through the audited extraction pipeline. How. scripts/ai_assistant/web_ui.py is the entry; UI primitives factored into scripts/ai_assistant/ui/{wizard,chat,conversations, streaming,...}.py. Theme + bridge JS in scripts/ai_assistant/ui/assets/.

Plotly + Kaleido

What. Interactive charts (Plotly) + headless export (Kaleido). Why. run_python_analysis renders model output as Plotly figures; Kaleido exports them as PNG so the persisted analysis .py file produces reproducible images on a fresh run. How. Used inside the sandbox subprocess child only — the agent’s parent process does not import plotly.

Runtime — security

scripts.security.* (in-tree)

What. The PHI handling surface lives entirely in-tree:

Why. No external dependency for PHI handling — auditors can read every line of the security surface without trusting an upstream maintainer. How. See PHI Architecture for the full architecture.

cryptography (HMAC + secure_zero_fill)

What. Standard library wrapper for HMAC-SHA256 and secure random. Why. Used for per-subject SANT date jitter and ID pseudonymization. How. scripts.security.phi_scrub.pseudo_id(), scripts.security.phi_scrub.date_offset_days().

Runtime — observability

Python logging (with custom redactor)

What. Standard logging. Why. Familiar API; the redactor is a single logging.Filter so we don’t need a logging-framework dependency. How. scripts.utils.log_hygiene.install_phi_redactor() attaches a filter to the root logger that scrubs API keys + PHI patterns from every log line at format time.

structlog (deferred)

What. Not currently used. Why mentioned. Open question whether to migrate to structlog for structured logging in a future phase; for now standard logging is sufficient.

Development

pip-audit

What. Dependency vulnerability scanner. Why. Catches known CVEs in pinned dependencies before they reach production. How. Runs on demand via make security and should be included in local release verification.

Sphinx + sphinx-rtd-theme + myst-parser

What. Documentation generator. Why. RST + autodoc gives free API reference from docstrings; mature toctree semantics. How. make docs builds; make docs-quality runs the doc-freshness lint and a -W (warnings as errors) Sphinx rebuild. CI gate at .github/workflows/docs-quality-check.yml.

Custom type stubs

typings/ ships in-tree stubs for two providers whose upstream typing is incomplete:

  • typings/anthropic/ — covers the raw SDK’s messages.create / messages.stream surface used by the PDF orchestrator + legacy raw-PDF path.

  • typings/google/ — covers the google.genai.Client.models.generate_content surface.

The mypy config picks up typings/ automatically via mypy_path.

Pinning policy

  • Major versions pinned with caret semantics for runtime deps that the agent talks to (LangChain, Anthropic, Google) — e.g. langchain>=1.0.0,<2.0.0. Reason: provider APIs evolve; we catch the v2 break in CI before it reaches production.

  • Streamlit pinned to ``>=1.38, <2.0`` because st.session_state semantics changed materially across major versions.

  • All other deps pinned with >= only; uv.lock records the resolved versions reproducibly.

Where this is enforced: pyproject.toml (top-level + dev / test / docs optional groups). The lockfile (uv.lock) is the source of truth for the installed tree.