Tech Stack
Every runtime and development dependency, grouped by role, with one paragraph each on
what it is, why it was chosen, and how the project uses
it. Pinned versions and rationale live in pyproject.toml.
Runtime — language and tooling
Python 3.11+
What. The host language. Why. Required for
concurrent.futures clean shutdown semantics, asyncio.timeout,
and the X | Y union syntax used throughout the codebase. How.
pyproject.toml pins requires-python = ">=3.11"; CI matrix
runs against 3.11 / 3.12 / 3.13.
uv
What. A Rust-based pip / poetry / pipx replacement.
Why. 10-100× faster lockfile resolution; reproducible
environments. How. Project-wide convention: uv sync
--all-groups to install, uv run to invoke. The Makefile assumes
uv; CI installs it via the official astral-sh/setup-uv action,
pinned to an immutable commit SHA.
Ruff
What. A fast Rust-based Python linter + formatter.
Why. Single tool that replaces flake8, isort, pyupgrade, and
includes S (flake8-bandit) security rules. How. Configuration
at pyproject.toml:215-241 (the [tool.ruff.lint] section).
S101 (assert) is per-file-ignored for tests/ since pytest
idiom; S603 (subprocess) is whitelisted at our hardened
subprocess.run callsites with # noqa: S603.
mypy
What. Static type checker. Why. Catches a class of LLM-flow
bugs such as nullable provider names reaching SDK constructors. How.
pyproject.toml configures ignore_missing_imports = true so
optional deps don’t block; custom stubs live in typings/ for
google.genai and anthropic.
Pytest
What. Test runner. Why. Mature ecosystem, conftest.py
fixtures, deterministic markers. How.
Testing covers the test-file conventions. make test runs
the deterministic subset that excludes the AI Assistant construction
smokes; make test-all runs the full suite.
Runtime — pipeline
pandas
What. Tabular dataframe library. Why. Excel reading, JSONL
output, dataset cleanup, k-anonymity equivalence-class lookups all
ride on pandas. How.
scripts.extraction.dataset_pipeline reads the raw Excel into
a DataFrame; per-row records are serialised to JSONL with the
provenance dict.
openpyxl
What. Excel .xlsx reader/writer. Why. pandas’s default
.xlsx engine. How. Used implicitly by pd.read_excel for
the dictionary + dataset legs.
pypdf
What. Lightweight PDF text extractor. Why. Powers the legacy
raw-PDF API path. How. Used in
scripts.extraction.extract_pdf_data when the operator opts
into the gated raw-PDF API path with the two-part attestation.
pdfplumber
What. Layout-aware PDF extractor. Why. Per-character bounding
boxes give better structure recovery than pypdf for complex
multi-section CRFs. How. pdfplumber is
the always-on code path inside the two-way PDF orchestrator
(scripts.extraction.pdf_pipeline). Extracted text is
PHI-redacted before any LLM call; the LLM response is merged with
the code candidate via _merge.
PyYAML
What. YAML parser. Why. The PHI scrub catalog
(scripts/security/phi_scrub.yaml) and the study-knowledge
overlay (config/study_knowledge.yaml) ship as YAML so domain
experts can edit without touching code. How. Loaded once at
import time; cached.
Runtime — agent
LangChain + LangGraph
What. LLM-agent framework. Why. init_chat_model gives
provider-agnostic construction (Anthropic / OpenAI / Google / Ollama
/ NVIDIA all behind one API); LangGraph’s ReAct prebuilt is the
agent topology. How. scripts.ai_assistant.agent_graph is
the only module that constructs an LLM client; every client takes
api_key= as an explicit kwarg sourced from the in-memory
KeyStore — no os.environ lookup at construction time.
LangChain provider packages
What. Per-provider LangChain integrations: langchain-anthropic,
langchain-openai, langchain-google-genai,
langchain-ollama, langchain-nvidia-ai-endpoints. Why.
Each provider has its own client + auth + retry semantics; the
LangChain wrappers normalise them. How. All five are declared
runtime dependencies; init_chat_model("anthropic:claude-...")
dispatches to the right wrapper based on the provider prefix.
anthropic, google-genai (raw SDKs)
What. Provider raw SDKs. Why. The PDF orchestrator’s
_extract_via_llm calls the raw SDK directly because the
orchestrator’s contract is a single non-streaming JSON response
with PHI-redacted text — heavier LangChain machinery is overkill
here. How.
scripts.extraction.pdf_pipeline._extract_via_llm() dispatches
on provider ∈ {anthropic, google, gemini, google-genai}.
Streamlit ≥ 1.38, < 2.0
What. Web UI framework. Why. Fast prototyping; built-in
session_state and chat widgets. The chat UI intentionally has no
file-upload surface; source data enters through the audited extraction
pipeline. How.
scripts/ai_assistant/web_ui.py is the entry; UI primitives
factored into scripts/ai_assistant/ui/{wizard,chat,conversations,
streaming,...}.py. Theme + bridge JS in
scripts/ai_assistant/ui/assets/.
Plotly + Kaleido
What. Interactive charts (Plotly) + headless export (Kaleido).
Why. run_python_analysis renders model output as Plotly
figures; Kaleido exports them as PNG so the persisted analysis
.py file produces reproducible images on a fresh run. How.
Used inside the sandbox subprocess child only — the agent’s parent
process does not import plotly.
Runtime — security
scripts.security.* (in-tree)
What. The PHI handling surface lives entirely in-tree:
scripts.security.phi_scrub— 8-action honest-broker catalogscripts.security.phi_patterns— shared regex catalogscripts.security.phi_allowlist— clinical-phrase exemptionscripts.security.phi_gate— agent-output gatescripts.security.kanon_gate— k-anon (k=5) + l-diversity (l=2)scripts.security.secure_env— zone guards
Why. No external dependency for PHI handling — auditors can read every line of the security surface without trusting an upstream maintainer. How. See PHI Architecture for the full architecture.
cryptography (HMAC + secure_zero_fill)
What. Standard library wrapper for HMAC-SHA256 and secure
random. Why. Used for per-subject SANT date jitter and ID
pseudonymization. How. scripts.security.phi_scrub.pseudo_id(),
scripts.security.phi_scrub.date_offset_days().
Runtime — observability
Python logging (with custom redactor)
What. Standard logging. Why. Familiar API; the redactor is a
single logging.Filter so we don’t need a logging-framework
dependency. How.
scripts.utils.log_hygiene.install_phi_redactor() attaches a
filter to the root logger that scrubs API keys + PHI patterns from
every log line at format time.
structlog (deferred)
What. Not currently used. Why mentioned. Open question
whether to migrate to structlog for structured logging in a
future phase; for now standard logging is sufficient.
Development
pip-audit
What. Dependency vulnerability scanner. Why. Catches known
CVEs in pinned dependencies before they reach production. How.
Runs on demand via make security and should be included in local
release verification.
Sphinx + sphinx-rtd-theme + myst-parser
What. Documentation generator. Why. RST + autodoc gives
free API reference from docstrings; mature toctree semantics. How.
make docs builds; make docs-quality runs the doc-freshness
lint and a -W (warnings as errors) Sphinx rebuild. CI gate at
.github/workflows/docs-quality-check.yml.
Custom type stubs
typings/ ships in-tree stubs for two providers whose upstream
typing is incomplete:
typings/anthropic/— covers the raw SDK’smessages.create/messages.streamsurface used by the PDF orchestrator + legacy raw-PDF path.typings/google/— covers thegoogle.genai.Client.models.generate_contentsurface.
The mypy config picks up typings/ automatically via
mypy_path.
Pinning policy
Major versions pinned with caret semantics for runtime deps that the agent talks to (LangChain, Anthropic, Google) — e.g.
langchain>=1.0.0,<2.0.0. Reason: provider APIs evolve; we catch the v2 break in CI before it reaches production.Streamlit pinned to ``>=1.38, <2.0`` because
st.session_statesemantics changed materially across major versions.All other deps pinned with
>=only;uv.lockrecords the resolved versions reproducibly.
Where this is enforced: pyproject.toml (top-level + dev /
test / docs optional groups). The lockfile (uv.lock) is the
source of truth for the installed tree.