Sphinx Auto-Documentation Guide
=================================
**For Developers: Automated Documentation System**
This guide explains how Sphinx automatically generates documentation from your code
and how to enhance automation for "write code → instant docs" workflow.
**Last Updated:** October 23, 2025
Current Automation Status
--------------------------
✅ What's Already Automated
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. **API Documentation from Docstrings** (FULLY AUTOMATED)
Sphinx `autodoc` automatically extracts:
- Function signatures with type hints
- Docstrings (Google/NumPy style)
- Class hierarchies and inheritance
- Module-level documentation
- Return types and parameters
**Example:**
.. code-block:: python
# In your code: config.py
def normalize_dataset_name(folder_name: Optional[str]) -> str:
"""
Normalize a dataset folder name by removing common suffixes.
Args:
folder_name: Dataset folder name to normalize
Returns:
Normalized dataset name without common suffixes
"""
# ... implementation
**Result:** Automatically appears in ``docs/sphinx/api/config.rst`` when you run ``make html``!
2. **Type Hints Rendering** (FULLY AUTOMATED)
The ``sphinx-autodoc-typehints`` extension automatically renders:
- Function parameters with types
- Return type annotations
- Variable type hints
- Complex types (List, Dict, Optional, etc.)
3. **Version Tracking** (SEMI-AUTOMATED)
Version is automatically pulled from ``__version__.py``:
.. code-block:: python
# docs/sphinx/conf.py
from __version__ import __version__
version: str = __version__
release: str = __version__
4. **Cross-References** (AUTOMATED)
Sphinx automatically creates links between:
- Function references
- Class references
- Module references
- External library docs (via intersphinx)
❌ What's Still Manual
~~~~~~~~~~~~~~~~~~~~~~~
1. **User Guides** - Manual writing required
- Tutorials and how-tos
- Conceptual explanations
- Examples and workflows
2. **Developer Guides** - Manual writing required
- Architecture decisions
- Design patterns
- Best practices
3. **Changelog** - Manual updates required
- Version history
- Breaking changes
- Migration guides
How It Works
------------
The Autodoc Pipeline
~~~~~~~~~~~~~~~~~~~~
.. code-block:: text
1. You write code with docstrings
↓
2. Sphinx autodoc reads Python source
↓
3. Extracts docstrings, signatures, types
↓
4. Generates .rst documentation
↓
5. Builds HTML automatically
**Example Flow:**
.. code-block:: python
# Step 1: Write code (config.py)
def ensure_directories() -> None:
"""Create all required output directories.
This function creates:
- RESULTS_DIR
- CLEAN_DATASET_DIR
- DICTIONARY_JSON_OUTPUT_DIR
Raises:
OSError: If directory creation fails
"""
os.makedirs(RESULTS_DIR, exist_ok=True)
# ...
.. code-block:: bash
# Step 2: Run Sphinx build
cd docs/sphinx && make html
# Step 3: Documentation is automatically generated! ✅
Current Setup
-------------
Sphinx Extensions Enabled
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
# docs/sphinx/conf.py
extensions = [
'sphinx.ext.autodoc', # Auto-generate from docstrings ✅
'sphinx.ext.viewcode', # Link to source code ✅
'sphinx.ext.intersphinx', # Link to external docs ✅
'sphinx.ext.napoleon', # Google/NumPy docstrings ✅
'sphinx_autodoc_typehints', # Render type hints ✅
]
Auto-Documentation Files
~~~~~~~~~~~~~~~~~~~~~~~~~
These files use ``automodule`` directive to auto-generate content:
.. code-block:: text
docs/sphinx/api/
├── modules.rst # Auto-generated module index
├── config.rst # Auto-docs for config.py
├── main.rst # Auto-docs for main.py
├── scripts.rst # Auto-docs for scripts package
├── scripts.deidentify.rst # Auto-docs for deidentify.py
├── scripts.extract_data.rst # Auto-docs for extract_data.py
├── scripts.load_dictionary.rst # Auto-docs for load_dictionary.py
└── scripts.utils.*.rst # Auto-docs for utils modules
Each uses:
.. code-block:: rst
.. automodule:: config
:members:
:undoc-members:
:show-inheritance:
Enhancing Automation
---------------------
🚀 Level 1: Watch Mode (AVAILABLE NOW)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Auto-rebuild documentation when files change:
.. code-block:: bash
# Install sphinx-autobuild
pip install sphinx-autobuild
# Run in watch mode
cd docs/sphinx
sphinx-autobuild . _build/html
# Opens browser, auto-refreshes on code changes! ✨
**Makefile target (add this):**
.. code-block:: makefile
.PHONY: docs-watch
docs-watch:
@cd docs/sphinx && sphinx-autobuild . _build/html --open-browser
Then just:
.. code-block:: bash
make docs-watch
Now whenever you save a Python file with docstrings, the docs rebuild automatically!
🚀 Level 2: Git Hook Integration (RECOMMENDED)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Automatically rebuild docs when you commit code changes:
**Create `.git/hooks/post-commit`:**
.. code-block:: bash
#!/bin/bash
# Auto-rebuild documentation after code commits
echo "🔧 Rebuilding documentation..."
cd docs/sphinx
make html
echo "✅ Documentation updated!"
.. code-block:: bash
chmod +x .git/hooks/post-commit
Now docs rebuild every time you commit! ✨
🚀 Level 3: CI/CD Auto-Deploy (PRODUCTION)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Automatically build and deploy docs on every push:
**GitHub Actions Example (.github/workflows/docs.yml):**
.. code-block:: yaml
name: Build and Deploy Docs
on:
push:
branches: [main]
jobs:
build-docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.13'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install sphinx sphinx_rtd_theme
- name: Build documentation
run: |
cd docs/sphinx
make html
- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: docs/sphinx/_build/html
**Result:** Push code → Docs auto-build → Deploy to web! 🌐
🚀 Level 4: Docstring Quality Checks (AUTOMATION)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ensure docstrings exist and are properly formatted:
**pydocstyle check:**
.. code-block:: bash
# Install pydocstyle
pip install pydocstyle
# Check docstring quality
pydocstyle scripts/
**Add to pre-commit hook:**
.. code-block:: bash
#!/bin/bash
# .git/hooks/pre-commit
echo "Checking docstrings..."
pydocstyle scripts/ || {
echo "❌ Docstring issues found!"
exit 1
}
echo "✅ Docstrings OK"
🚀 Level 5: Auto-Generate Changelog (ADVANCED)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Auto-generate changelog from commit messages:
**Install conventional-changelog:**
.. code-block:: bash
npm install -g conventional-changelog-cli
# Generate changelog
conventional-changelog -p angular -i CHANGELOG.md -s
**Or use Python:**
.. code-block:: bash
pip install gitchangelog
gitchangelog > docs/sphinx/changelog.rst
Best Practices for Auto-Documentation
--------------------------------------
Write Good Docstrings
~~~~~~~~~~~~~~~~~~~~~~
**Use Google or NumPy style consistently:**
.. code-block:: python
def process_data(input_file: str, options: Dict[str, Any]) -> pd.DataFrame:
"""Process input data file with specified options.
This function reads an Excel file and applies various transformations
based on the provided options dictionary.
Args:
input_file: Path to input Excel file
options: Dictionary of processing options with keys:
- 'validate': bool - Enable validation
- 'clean': bool - Remove empty rows
Returns:
Processed DataFrame with cleaned data
Raises:
FileNotFoundError: If input file doesn't exist
ValueError: If options are invalid
Example:
>>> df = process_data('data.xlsx', {'validate': True})
>>> len(df)
100
Note:
This function modifies data in-place. Make a copy if needed.
See Also:
validate_data: Validation function used internally
"""
# ... implementation
Use Type Hints Everywhere
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
from typing import Optional, List, Dict, Any
def get_dataset_folder() -> Optional[str]:
"""Get the first dataset folder."""
# Type hint automatically appears in docs!
Add Module-Level Documentation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
"""
Data Extraction Module
======================
This module provides functions for extracting data from Excel files
and converting to JSONL format.
Key Functions:
- extract_excel_to_jsonl: Main extraction function
- process_excel_file: Single file processor
- clean_record_for_json: Data cleaning
Example:
>>> from scripts.extract_data import extract_excel_to_jsonl
>>> extract_excel_to_jsonl(input_dir, output_dir)
"""
Use Explicit __all__ Exports
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
__all__ = [
'extract_excel_to_jsonl',
'process_excel_file',
'clean_record_for_json',
]
Only these appear in ``from module import *`` and are prioritized in docs.
Current Workflow
----------------
Immediate Auto-Documentation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Right now, you can already do this:
.. code-block:: bash
# 1. Write code with docstrings
vim config.py
# 2. Build docs (reads your code automatically)
cd docs/sphinx && make html
# 3. View updated docs
open _build/html/api/config.html
**Your docstrings → Instant API docs!** ✅
Recommended Workflow
~~~~~~~~~~~~~~~~~~~~
.. code-block:: bash
# Terminal 1: Watch mode
make docs-watch
# Terminal 2: Write code
vim scripts/deidentify.py
# Add/update docstrings
# Save file
# → Browser automatically refreshes with new docs! ✨
Implementation Checklist
------------------------
Quick Wins (Do Now)
~~~~~~~~~~~~~~~~~~~
.. code-block:: text
☐ Install sphinx-autobuild
☐ Add docs-watch target to Makefile
☐ Create post-commit git hook
☐ Document the workflow for team
Medium Term
~~~~~~~~~~~
.. code-block:: text
☐ Set up GitHub Actions for auto-deploy
☐ Add pydocstyle to pre-commit hooks
☐ Create docstring templates/snippets
☐ Add coverage reports for documentation
Long Term
~~~~~~~~~
.. code-block:: text
☐ Auto-generate changelog from commits
☐ Set up Read the Docs hosting
☐ Add API diff detection for breaking changes
☐ Implement version-specific documentation
Summary
-------
**You Already Have:**
✅ Auto-documentation from docstrings (``autodoc``)
✅ Type hints rendering (``sphinx-autodoc-typehints``)
✅ Cross-references and linking
✅ Multiple output formats (HTML, PDF)
**You Can Add:**
🚀 Watch mode for instant rebuilds
🚀 Git hooks for automatic updates
🚀 CI/CD for automatic deployment
🚀 Quality checks for docstrings
🚀 Automated changelog generation
**The Goal:**
.. code-block:: text
Write code → Save file → Docs update automatically ✨
With `sphinx-autobuild` in watch mode, **you're already 90% there!**
Related Documentation
---------------------
- :doc:`documentation_style_guide` - Documentation standards
- :doc:`contributing` - Contribution guidelines
- :doc:`script_reorganization` - Project organization
External Resources
------------------
- `Sphinx Documentation `_
- `sphinx-autobuild `_
- `Google Style Guide `_
- `Read the Docs `_
---
**TL;DR:** Yes! Sphinx already auto-generates API docs from your code. Install ``sphinx-autobuild`` for instant updates while you code! 🚀