References

What. Every regulation, standard, paper, and external resource cited in the RePORT AI Portal codebase or IRB/Auditor profile, collected in one place with URLs and a line on which pillar / module they back.

Why. Regulatory traceability is a developer concern. If you’re touching the PHI scrubber’s catalog or the agent-boundary gate, you should be able to reach the primary source for HIPAA §164.514(b)(2)(i) or ICMR §11.7 from the docs in one click. If you’re adding a new rule class, you should know which regulation the rule answers.

How. Sorted by concern area (regulation / standard / technique / benchmark). Each entry includes a short “used for” line pointing at the module or pillar the reference backs.

Primary Regulations

HIPAA Privacy Rule — §164.514 De-identification

DPDPA 2023 — Digital Personal Data Protection Act

SPDI Rules 2011 (under IT Act §43A)

Aadhaar Act 2016 — §29 restrictions on sharing identity information

ICMR National Ethical Guidelines for Biomedical & Health Research (2017)

ABDM Health Data Management Policy (NHA)

RePORT India Common Protocol

  • Project site. https://www.reportindia.org

  • What we use it for. The parent study protocol under which Indo-VAP runs. Dictates the 72-hour IRB notification window for PHI breaches, which the study team must encode in its breach-response runbook before production ingest.

Standards & Frameworks

NIST SP 800-188 — De-Identifying Government Datasets

NIST SP 800-175B — Guideline for Using Cryptographic Standards

NIST SP 800-53 (SI-7 Software, Firmware, and Information Integrity)

STROBE — Strengthening the Reporting of Observational Studies in Epidemiology

RECORD — REporting of studies Conducted using Observational Routinely-collected health Data

  • Text. https://www.record-statement.org

  • What we use it for. Extension of STROBE for routinely-collected data (EHR, registry). §3 backs NA-preservation behaviour — clinical strings like “NR” / “NA” / “NK” must not be coerced to Python None during extraction.

CDISC SDTM / ODM — Clinical Data Interchange Standards

FDA 21 CFR Part 11 — Electronic Records + Electronic Signatures

HHS Honest Broker guidance

Techniques

SANT — Shift-And-Not-Truncate

  • Primary citation. El Emam et al., “A method for managing re-identification risk from small geographic areas in Canada,” BMC Medical Informatics and Decision Making, 2010.

  • What we use it for. Per-subject constant date offset so intra-subject intervals are preserved exactly — the scripts.security.phi_scrub.date_offset_days() algorithm.

k-anonymity — Sweeney 2002

l-diversity — Machanavajjhala et al. 2007

HMAC (RFC 2104)

Benchmarks & Comparative Studies

Microsoft Presidio benchmarks (2024-2025)

John Snow Labs Clinical NER

  • Site. https://www.johnsnowlabs.com

  • What we use it for. Reference point for what commercial clinical NER can do (~98.6% F1). Not used at runtime — documented in ADR-004 as a “what we gave up” note.

i2b2 / n2c2 de-identification shared tasks

Tools & Libraries Cited in Decisions

pdfplumber

  • Site. https://github.com/jsvine/pdfplumber

  • What we use it for. Long-term target for local-only PDF extraction (to replace the current external-API path under ADR-006). Not yet in the runtime.

Ollama

Reading Order for a New Contributor

If you’re new to the project and need to come up to speed:

  1. Read Overview for the pain narrative.

  2. Read PHI Architecture for the four-tier + 8-action story.

  3. Read Architecture Decisions (ADRs) in full — the Why answers are here.

  4. Come back here as a reference when you need to justify or challenge an architectural choice.

  5. Read the HIPAA §164.514(b)(2)(i) primary source and the NIST SP 800-188 first three sections to ground the regulatory vocabulary.