Testing
=======

This page documents the test and verification gates that are active in
the repository today. It intentionally avoids static test counts; the
authoritative count is the pytest output for the commit under review.

Active Commands
---------------

Run these from the repository root:

.. code-block:: bash

   make test          # deterministic subset; excludes selected AI Assistant construction tests
   make test-all      # full pytest suite
   make lint          # ruff check --fix + ruff format
   make typecheck     # mypy scripts/ main.py config.py
   make security      # pip-audit dependency scan
   make verify        # fast local readiness gate
   make docs-quality  # doc freshness + Sphinx warnings-as-errors build
   make release-check # full release readiness gate

Direct equivalents:

.. code-block:: bash

   uv run pytest tests/
   uv run ruff check .
   uv run ruff format --check .
   uv run mypy scripts/ main.py config.py --ignore-missing-imports
   uv run python scripts/lint_doc_freshness.py

Current Test Layout
-------------------

The suite is intentionally flat except for security-focused tests:

.. code-block:: text

   tests/
   ├── conftest.py
   ├── test_agent_graph.py
   ├── test_agent_tools.py
   ├── test_dataset_pipeline.py
   ├── test_extract_pdf_data.py
   ├── test_load_dictionary.py
   ├── test_phi_scrub.py
   ├── test_run_study_analysis.py
   ├── test_web_ui.py
   ├── test_*.py
   └── security/
       ├── test_adversarial_phi_safe.py
       ├── test_kanon_l_diversity.py
       ├── test_keystore.py
       ├── test_pdf_redaction_pipeline.py
       ├── test_sandbox_isolation.py
       └── test_*.py

There are no active ``tests/ai_assistant/`` or ``tests/extraction/``
subpackages. Agent, extraction, UI, and pipeline tests live as
top-level ``tests/test_*.py`` modules; security regression tests live
under ``tests/security/``.

What Each Gate Proves
---------------------

``make test``
   Runs the deterministic pytest subset. Use it for fast local checks
   when you did not touch LLM construction, CLI provider selection, or
   telemetry surfaces.

``make test-all``
   Runs the full suite. Use it before PRs that touch PHI handling,
   agent tools, provider construction, pipeline flow, or public docs.

``make verify``
   Runs the fast local readiness check used by the maintainer workflow:
   Ruff, mypy, and presence checks for load-bearing security modules.
   It is not a substitute for ``make test-all`` on high-risk changes.

``make docs-quality``
   Runs ``scripts/lint_doc_freshness.py`` and builds Sphinx with
   warnings treated as errors. This is required for documentation
   changes and for code changes that alter public behavior.

``make security``
   Runs ``pip-audit`` against the locked environment. It is the local
   dependency-vulnerability gate; it does not replace code review for
   application-layer security.

``make release-check``
   Runs ``verify → typecheck → test-all → docs-ci → security``. This is
   the pre-tag gate for release-candidate builds.

PHI-Critical Coverage
---------------------

PHI and boundary behavior is covered by dedicated tests across the
normal and security suites:

.. code-block:: text

   tests/test_phi_scrub.py
   tests/test_phi_gate.py
   tests/test_phi_safe_input_gates.py
   tests/test_agent_tools_phi_safe.py
   tests/test_file_access.py
   tests/test_secure_env.py
   tests/test_secure_staging.py
   tests/test_log_hygiene.py
   tests/test_lineage_manifest.py
   tests/test_pdf_phi_flag.py
   tests/test_pipeline_provenance.py
   tests/security/test_adversarial_phi_safe.py
   tests/security/test_kanon_l_diversity.py
   tests/security/test_keystore.py
   tests/security/test_llm_capabilities.py
   tests/security/test_llm_construction_smoke.py
   tests/security/test_log_hygiene_keys.py
   tests/security/test_no_keys_in_parent_environ.py
   tests/security/test_pdf_redaction_pipeline.py
   tests/security/test_phase2_pipeline_polish.py
   tests/security/test_phase2_polish_permissions.py
   tests/security/test_sandbox_isolation.py

The IRB conformance matrix maps each regulated claim to the specific
test or test family that guards it.

Writing Tests
-------------

Use pytest and keep tests close to the behavior they protect.

Naming rules:

* Test files use ``test_<module_or_behavior>.py``.
* Test classes use ``Test<Behavior>`` when grouping scenarios adds
  clarity.
* Test names describe the behavior and edge case, not the implementation
  detail.

Pattern:

.. code-block:: python

   from pathlib import Path

   import pandas as pd

   from scripts.extraction.dataset_pipeline import extract_single_dataset


   def test_extract_single_dataset_rejects_unsupported_suffix(tmp_path: Path) -> None:
       unsupported = tmp_path / "legacy.ods"
       unsupported.write_bytes(b"fake")

       success, count, error = extract_single_dataset(
           unsupported,
           tmp_path / "out",
           "Indo-VAP",
           "2026-04-28T00:00:00+00:00",
       )

       assert success is False
       assert count == 0
       assert error is not None

Prefer real filesystem fixtures for path/zone behavior. Mock only
network calls, LLM clients, time-sensitive surfaces, and hard-to-trigger
error branches.

CI Behavior
-----------

``.github/workflows/ci.yml`` runs Ruff, mypy, the full pytest suite,
and ``pip-audit`` on Python 3.11, 3.12, and 3.13 for pushes and PRs.

``.github/workflows/docs-quality-check.yml`` runs the doc-freshness
linter, builds Sphinx, runs linkcheck, and reports size/version drift for
documentation-touching pushes and PRs.

When a change touches security, PHI boundaries, provider construction, or
the pipeline publish path, include the local verification transcript in
the PR description.