Installation

For Users: Getting Started

This guide will help you install RePORTaLiN on your computer in just a few simple steps.

Prerequisites

Before installing RePORTaLiN, ensure you have:

  • Python 3.13 or higher installed on your system

  • pip package manager (comes with Python)

  • Git (optional, for cloning the repository)

Checking Python Version

To verify your Python version:

python --version
# or
python3 --version

You should see output like: Python 3.13.5 or higher.

Installation Steps

Step 1: Clone the Repository

If you have Git installed:

git clone https://github.com/solomonsjoseph/RePORTaLiN.git
cd RePORTaLiN

Alternatively, download the ZIP file from GitHub and extract it.

Step 3: Install Dependencies

Install all required packages using pip:

pip install -r requirements.txt

This will install:

Core Dependencies:

  • pandas (≥2.0.0): Data manipulation and Excel reading

  • openpyxl (≥3.1.0): Excel file format support (.xlsx files)

  • numpy (≥1.24.0): Numerical operations

  • tqdm (≥4.66.0): Required - Progress bars and clean console output

Security:

  • cryptography (≥41.0.0): Encryption for de-identification mappings

Documentation (Optional):

  • sphinx (≥7.0.0): Documentation generation

  • sphinx-rtd-theme (≥1.3.0): ReadTheDocs theme

  • sphinx-autodoc-typehints (≥1.24.0): Type hints in docs

Verifying Installation

To verify the installation was successful:

Option 1: Quick Check

# Check if main modules can be imported
python -c "import pandas, openpyxl, numpy, tqdm, cryptography; print('✅ All dependencies installed successfully!')"

Option 2: Run Help Command

python main.py --help

You should see the usage information without any errors.

Option 3: Test Run

# Run a quick test (make sure you have data in data/dataset/)
python main.py

If you see progress bars and status messages without errors, the installation is successful!

Directory Structure

After installation, your project structure should look like:

RePORTaLiN/
├── main.py                 # Main entry point
├── config.py               # Configuration
├── requirements.txt        # Dependencies
├── Makefile               # Build commands (optional)
├── README.md              # Project overview
├── scripts/               # Core modules
│   ├── extract_data.py   # Excel to JSONL extraction
│   ├── load_dictionary.py # Dictionary processor
│   └── utils/
│       ├── deidentify.py # De-identification script
│       └── logging.py # Centralized logging
├── data/                  # Your data files go here
│   ├── dataset/
│   │   └── <dataset_name>/  # Excel files (e.g., Indo-vap_csv_files/)
│   └── data_dictionary_and_mapping_specifications/
│       └── RePORT_DEB_to_Tables_mapping.xlsx
├── results/               # Output files (created automatically)
│   ├── dataset/           # Extracted JSONL files
│   ├── deidentified/      # De-identified data (if enabled)
│   └── data_dictionary_mappings/  # Dictionary outputs
├── docs/                  # Documentation
│   └── sphinx/            # Sphinx documentation
├── .logs/                 # Execution logs (created automatically)
└── .venv/                 # Virtual environment (if created)

Troubleshooting Installation

Problem: “pip: command not found”

Solution: Install pip or use python -m pip instead:

# Try using python -m pip
python -m pip install -r requirements.txt

# Or on macOS/Linux
python3 -m pip install -r requirements.txt

Problem: “Permission denied” errors

Solution: Use the --user flag or ensure you’re in a virtual environment:

# Option 1: Install with --user flag
pip install --user -r requirements.txt

# Option 2: Use virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate  # macOS/Linux
# .venv\Scripts\activate   # Windows
pip install -r requirements.txt

Problem: Import errors after installation

Solution: Ensure you’re in the correct directory and virtual environment:

# 1. Check current directory
pwd
# Should show: .../RePORTaLiN

# 2. Ensure virtual environment is activated
which python
# Should show: .../RePORTaLiN/.venv/bin/python

# 3. Reinstall dependencies
pip install --force-reinstall -r requirements.txt

Problem: “Package ‘cryptography’ not found”

Solution: The cryptography package may need system dependencies:

macOS:

# Install OpenSSL with Homebrew
brew install openssl
pip install cryptography

Ubuntu/Debian:

sudo apt-get install build-essential libssl-dev libffi-dev python3-dev
pip install cryptography

Windows:

# Usually works with pip alone
pip install cryptography
# If issues persist, install Microsoft C++ Build Tools

Problem: Excel file reading errors

Solution: Ensure openpyxl is properly installed:

pip install --upgrade openpyxl

# Test it
python -c "import openpyxl; print('openpyxl version:', openpyxl.__version__)"

Problem: Incompatible Python version

Solution: Install Python 3.13 or higher:

  • macOS: Use Homebrew: brew install python@3.13

  • Ubuntu/Debian: sudo apt-get install python3.13

  • Windows: Download from python.org

Upgrading

To upgrade to the latest version:

# Pull latest changes (if using Git)
git pull origin main

# Upgrade dependencies
pip install --upgrade -r requirements.txt

Setting Up Your Data

Before running RePORTaLiN, ensure your data is properly organized:

Step 1: Place Excel Files

Put your Excel data files in:

data/dataset/<your_dataset_name>/

For example:

data/dataset/Indo-vap_csv_files/
├── 1A_ICScreening.xlsx
├── 1B_HCScreening.xlsx
├── 2A_Index_Baseline.xlsx
└── ...

Step 2: Add Data Dictionary

Place your data dictionary Excel file in:

data/data_dictionary_and_mapping_specifications/
└── RePORT_DEB_to_Tables_mapping.xlsx

Step 3: Verify Setup

# Check if files are in place
ls data/dataset/
ls data/data_dictionary_and_mapping_specifications/

The pipeline will automatically detect your dataset folder name and process all Excel files within it.

Next Steps

Now that RePORTaLiN is installed, proceed to: