Testing

This page documents the test and verification gates that are active in the repository today. It intentionally avoids static test counts; the authoritative count is the pytest output for the commit under review.

Active Commands

Run these from the repository root:

make test          # deterministic subset; excludes selected AI Assistant construction tests
make test-all      # full pytest suite
make lint          # ruff check --fix + ruff format
make typecheck     # mypy scripts/ main.py config.py
make security      # pip-audit dependency scan
make verify        # fast local readiness gate
make docs-quality  # doc freshness + Sphinx warnings-as-errors build
make release-check # full release readiness gate

Direct equivalents:

uv run pytest tests/
uv run ruff check .
uv run ruff format --check .
uv run mypy scripts/ main.py config.py --ignore-missing-imports
uv run python scripts/lint_doc_freshness.py

Current Test Layout

The suite is intentionally flat except for security-focused tests:

tests/
├── conftest.py
├── test_agent_graph.py
├── test_agent_tools.py
├── test_dataset_pipeline.py
├── test_extract_pdf_data.py
├── test_load_dictionary.py
├── test_phi_scrub.py
├── test_run_study_analysis.py
├── test_web_ui.py
├── test_*.py
└── security/
    ├── test_adversarial_phi_safe.py
    ├── test_kanon_l_diversity.py
    ├── test_keystore.py
    ├── test_pdf_redaction_pipeline.py
    ├── test_sandbox_isolation.py
    └── test_*.py

There are no active tests/ai_assistant/ or tests/extraction/ subpackages. Agent, extraction, UI, and pipeline tests live as top-level tests/test_*.py modules; security regression tests live under tests/security/.

What Each Gate Proves

make test

Runs the deterministic pytest subset. Use it for fast local checks when you did not touch LLM construction, CLI provider selection, or telemetry surfaces.

make test-all

Runs the full suite. Use it before PRs that touch PHI handling, agent tools, provider construction, pipeline flow, or public docs.

make verify

Runs the fast local readiness check used by the maintainer workflow: Ruff, mypy, and presence checks for load-bearing security modules. It is not a substitute for make test-all on high-risk changes.

make docs-quality

Runs scripts/lint_doc_freshness.py and builds Sphinx with warnings treated as errors. This is required for documentation changes and for code changes that alter public behavior.

make security

Runs pip-audit against the locked environment. It is the local dependency-vulnerability gate; it does not replace code review for application-layer security.

make release-check

Runs verify typecheck test-all docs-ci security. This is the pre-tag gate for release-candidate builds.

PHI-Critical Coverage

PHI and boundary behavior is covered by dedicated tests across the normal and security suites:

tests/test_phi_scrub.py
tests/test_phi_gate.py
tests/test_phi_safe_input_gates.py
tests/test_agent_tools_phi_safe.py
tests/test_file_access.py
tests/test_secure_env.py
tests/test_secure_staging.py
tests/test_log_hygiene.py
tests/test_lineage_manifest.py
tests/test_pdf_phi_flag.py
tests/test_pipeline_provenance.py
tests/security/test_adversarial_phi_safe.py
tests/security/test_kanon_l_diversity.py
tests/security/test_keystore.py
tests/security/test_llm_capabilities.py
tests/security/test_llm_construction_smoke.py
tests/security/test_log_hygiene_keys.py
tests/security/test_no_keys_in_parent_environ.py
tests/security/test_pdf_redaction_pipeline.py
tests/security/test_phase2_pipeline_polish.py
tests/security/test_phase2_polish_permissions.py
tests/security/test_sandbox_isolation.py

The IRB conformance matrix maps each regulated claim to the specific test or test family that guards it.

Writing Tests

Use pytest and keep tests close to the behavior they protect.

Naming rules:

  • Test files use test_<module_or_behavior>.py.

  • Test classes use Test<Behavior> when grouping scenarios adds clarity.

  • Test names describe the behavior and edge case, not the implementation detail.

Pattern:

from pathlib import Path

import pandas as pd

from scripts.extraction.dataset_pipeline import extract_single_dataset


def test_extract_single_dataset_rejects_unsupported_suffix(tmp_path: Path) -> None:
    unsupported = tmp_path / "legacy.ods"
    unsupported.write_bytes(b"fake")

    success, count, error = extract_single_dataset(
        unsupported,
        tmp_path / "out",
        "Indo-VAP",
        "2026-04-28T00:00:00+00:00",
    )

    assert success is False
    assert count == 0
    assert error is not None

Prefer real filesystem fixtures for path/zone behavior. Mock only network calls, LLM clients, time-sensitive surfaces, and hard-to-trigger error branches.

CI Behavior

.github/workflows/ci.yml runs Ruff, mypy, the full pytest suite, and pip-audit on Python 3.11, 3.12, and 3.13 for pushes and PRs.

.github/workflows/docs-quality-check.yml runs the doc-freshness linter, builds Sphinx, runs linkcheck, and reports size/version drift for documentation-touching pushes and PRs.

When a change touches security, PHI boundaries, provider construction, or the pipeline publish path, include the local verification transcript in the PR description.