Glossary
Short definitions for terms users will see in the portal and docs.
- audit files
Counts and lineage files under
output/{STUDY}/audit/. They help the study team review what the pipeline processed and published.- data dictionary
A study file that explains variables, labels, forms, and allowed values. The assistant uses it to ground questions in study meaning.
- hosted LLM
A model provider outside the user’s machine, such as Anthropic, OpenAI, or Google. Hosted providers require an API key and local study-team approval.
- local LLM
A model running on the user’s own machine, usually through Ollama. This is the recommended starting point when the team wants to avoid external model calls.
- PHI
Protected health information. In this project, raw study files are treated as PHI-bearing unless the study team has verified otherwise.
- PHI key
A local secret used by the scrubber to create stable pseudonyms and date shifts. It lives outside the repository.
- published bundle
The scrubbed study output under
output/{STUDY}/trio_bundle/. This is the main bundle the assistant uses for study questions.- raw study files
The source files placed under
data/raw/{STUDY}/. These files are treated as sensitive and are not the assistant’s normal working material.- scrub
The step that removes, masks, caps, generalizes, or pseudonymizes sensitive dataset fields before publishing the bundle.
- snapshot baseline
A study-team-reviewed cleaned bundle under
data/snapshots/{STUDY}/. The portal can restore it when PDF extraction fails or when the user chooses Use Existing Study.- study name
The folder name for the study, such as
Indo-VAP. It is used to finddata/raw/{STUDY}/and writeoutput/{STUDY}/.- trio bundle
Another name for the published bundle. It contains datasets, dictionary output, and optional PDF-derived variable information.
Developer Terms
If you need definitions for code-level privacy controls or pipeline internals, use the Developer Guide. For IRB/IEC or auditor review, use IRB/Auditor Profile. Those details are intentionally kept out of the user guide.