API Reference

This section provides detailed API documentation for all modules in the RePORTaLiN codebase.

Overview

RePORTaLiN’s API is organized into several modules:

Core Modules

__version__

Single source of truth for version information.

See: __version__ module

main

Main pipeline orchestrator and entry point.

See: main module

config

Configuration management and path resolution.

See: config module

scripts

Core processing modules for data extraction and dictionary loading.

See: scripts package

scripts.utils

Utility functions and classes used across the RePORTaLiN pipeline.

See: scripts.utils package

Quick Reference

Common Functions

Data Extraction:

from scripts.extract_data import extract_excel_to_jsonl, process_excel_file

# Extract all files
extract_excel_to_jsonl(input_dir, output_dir)

# Process single file
process_excel_file(file_path, output_dir)

Dictionary Loading:

from scripts.load_dictionary import load_study_dictionary

# Load data dictionary
load_study_dictionary(excel_file, output_dir)

Configuration:

import config

# Access paths
print(config.DATASET_DIR)
print(config.CLEAN_DATASET_DIR)

Logging:

from scripts.utils import logging as log

log.info("Information message")
log.success("Success message")
log.warning("Warning message")
log.error("Error message")

Common Patterns

Process All Files

from scripts.extract_data import find_excel_files, process_excel_file
from pathlib import Path

input_dir = Path("data/dataset/my_data")
output_dir = Path("results/my_data")

files = find_excel_files(input_dir)
for file in files:
    process_excel_file(file, output_dir)

Read JSONL Output

import pandas as pd

# Read JSONL file
df = pd.read_json('output.jsonl', lines=True)

Custom Processing

import pandas as pd
from scripts.extract_data import convert_dataframe_to_jsonl

# Read and transform
df = pd.read_excel('input.xlsx')
df['new_column'] = df['old_column'].apply(lambda x: x * 2)

# Export
convert_dataframe_to_jsonl(df, 'output.jsonl', 'input.xlsx')

Logging (scripts.utils):

from scripts.utils import logging as log

# Get logger for your module
logger = log.get_logger(__name__)
logger.info("Processing started")
logger.success("Processing completed successfully!")

# Or use quick access functions
from scripts.utils.logging import info, success, warning, error
info("Quick logging message")
success("Operation successful!")

Country Regulations (scripts.utils):

from scripts.utils.country_regulations import get_country_config

# Get country-specific configuration
config = get_country_config('India')
print(f"PII fields: {config.pii_fields}")
print(f"Date format: {config.date_format}")

Module Index