Introduction ============ **For Users: Understanding RePORTaLiN** Overview -------- RePORTaLiN is a tool that helps you convert Excel spreadsheets into a cleaner, more organized format called JSONL. It's designed to be easy to use, fast, and reliable - perfect for handling medical research data without requiring technical expertise. Purpose ------- The RePORTaLiN pipeline addresses the common challenge of converting complex Excel-based research data into a structured, machine-readable format. It's specifically designed for: - Medical research data management - Clinical trial data processing - Data standardization and validation - Research data archiving Key Benefits ------------ **Speed and Efficiency** The pipeline can process 43 Excel files in approximately 15-20 seconds, making it suitable for large-scale data processing tasks. **Intelligent Processing** - Automatic table detection within Excel sheets - Smart handling of empty rows and columns - Automatic data type inference and conversion - Duplicate column name resolution **Robustness** - Comprehensive error handling and recovery - Detailed logging for debugging and auditing - Progress tracking for long-running operations - Graceful handling of edge cases **Flexibility** - Works with any dataset folder automatically - Adjustable paths and settings - Easy to customize for your needs - Options to run specific parts of the process Use Cases --------- RePORTaLiN is ideal for: 1. **Data Migration**: Converting legacy Excel data to modern formats 2. **Data Integration**: Standardizing data from multiple sources 3. **Quality Assurance**: Validating and cleaning research data 4. **Archival**: Creating structured backups of Excel-based data 5. **Analysis Pipeline**: Preparing data for downstream analysis tools System Requirements ------------------- - **Python**: 3.13 or higher - **Operating System**: macOS, Linux, or Windows - **Memory**: Minimal (handles large Excel files efficiently) - **Storage**: Depends on dataset size (outputs are typically smaller than inputs) Design Philosophy ----------------- RePORTaLiN follows these core principles: **Simplicity** One command to run the entire pipeline: ``python main.py`` **Transparency** Comprehensive logging shows exactly what's happening at each step **Reliability** Extensive error handling ensures the pipeline fails gracefully **Maintainability** Clean, well-documented code makes it easy to understand and modify Next Steps ---------- - :doc:`installation`: Set up RePORTaLiN on your system - :doc:`quickstart`: Run your first data extraction - :doc:`configuration`: Customize the pipeline for your needs