Frequently Asked Questions

What is RePORT AI Portal?

It is a local-first assistant for one clinical research study. It helps the study team turn local source files into a PHI-scrubbed bundle and ask questions about that bundle through a chat interface.

Who is it for?

Clinical researchers, data managers, site operators, PIs, and reviewers working with one locked study.

What can I ask it?

Examples:

How many subjects match a cohort definition?
What variables are available for a study question?
Summarize missingness for a field.
Create a descriptive table or plot.
Run a basic analysis and explain the result in study language.

What files do I need?

At minimum, place study datasets and a data dictionary under data/raw/{STUDY_NAME}/:

data/raw/Indo-VAP/
├── datasets/
├── data_dictionary/
└── annotated_pdfs/        # optional

Do I need an API key?

Not if you use local Ollama. Hosted providers such as Anthropic, OpenAI, and Google require API keys.

Start with Ollama when possible:

export LLM_PROVIDER=ollama
export LLM_MODEL=qwen3:8b

How do I install it?

Use:

curl -LsSf https://astral.sh/uv/install.sh | sh
git clone https://github.com/solomonsjoseph/RePORT-AI-Portal.git
cd RePORT-AI-Portal
make chat

Then follow Quick Start.

How do I run it?

For the web UI:

make chat

For developer/operator pipeline runs only:

make pipeline

Can I use an existing processed study?

Yes. If output/{STUDY}/trio_bundle/ already exists, the web UI can use that existing published bundle instead of loading the study again.

Does the assistant read raw files?

The intended user workflow is that the assistant answers from the published, scrubbed study bundle under output/{STUDY}/trio_bundle/. Raw files belong under data/raw/{STUDY}/ and are handled by the loading pipeline.

For the full technical boundary, see PHI Architecture.

What if my PDFs may contain PHI?

Treat them as PHI-bearing by default. Do not set REPORTALIN_PDF_PHI_FREE=1 unless the study team has verified that the source PDFs are PHI-free and the use of the selected provider is allowed.

Can I skip PDFs?

Yes. PDFs are optional. The portal can still work from datasets and the data dictionary.

Where do I check what happened during a run?

Start with:

output/{STUDY}/README.md
output/{STUDY}/audit/
the terminal output from make pipeline or make chat

What if I find raw PHI in output?

Stop using and sharing the affected output. Preserve the file path and row context for the study team, but do not paste raw PHI into tickets or chat. Notify the study PI, data manager, and local review contact, then re-run only after the cause is fixed.

Can it handle large datasets?

It is designed for study-scale files, but runtime depends on local hardware, file size, and model choice. If a local model is slow, use a smaller Ollama model or a machine with more memory.

Is there a GUI?

Yes. Run:

make chat

The web UI guides the user through provider selection, study loading, and chat.

Where are developer details?

Use the Developer Guide for architecture, source files, tests, code-level behavior, and contributor workflow. Use IRB/Auditor Profile for reviewer-facing PHI handling and control evidence.