Sphinx Auto-Documentation Guide

For Developers: Automated Documentation System

This guide explains how Sphinx automatically generates documentation from your code and how to enhance automation for β€œwrite code β†’ instant docs” workflow.

Last Updated: October 23, 2025

Current Automation Status

βœ… What’s Already Automated

  1. API Documentation from Docstrings (FULLY AUTOMATED)

    Sphinx autodoc automatically extracts:

    • Function signatures with type hints

    • Docstrings (Google/NumPy style)

    • Class hierarchies and inheritance

    • Module-level documentation

    • Return types and parameters

    Example:

    # In your code: config.py
    def normalize_dataset_name(folder_name: Optional[str]) -> str:
        """
        Normalize a dataset folder name by removing common suffixes.
    
        Args:
            folder_name: Dataset folder name to normalize
    
        Returns:
            Normalized dataset name without common suffixes
        """
        # ... implementation
    

    Result: Automatically appears in docs/sphinx/api/config.rst when you run make html!

  2. Type Hints Rendering (FULLY AUTOMATED)

    The sphinx-autodoc-typehints extension automatically renders:

    • Function parameters with types

    • Return type annotations

    • Variable type hints

    • Complex types (List, Dict, Optional, etc.)

  3. Version Tracking (SEMI-AUTOMATED)

    Version is automatically pulled from __version__.py:

    # docs/sphinx/conf.py
    from __version__ import __version__
    version: str = __version__
    release: str = __version__
    
  4. Cross-References (AUTOMATED)

    Sphinx automatically creates links between:

    • Function references

    • Class references

    • Module references

    • External library docs (via intersphinx)

❌ What’s Still Manual

  1. User Guides - Manual writing required

    • Tutorials and how-tos

    • Conceptual explanations

    • Examples and workflows

  2. Developer Guides - Manual writing required

    • Architecture decisions

    • Design patterns

    • Best practices

  3. Changelog - Manual updates required

    • Version history

    • Breaking changes

    • Migration guides

How It Works

The Autodoc Pipeline

1. You write code with docstrings
   ↓
2. Sphinx autodoc reads Python source
   ↓
3. Extracts docstrings, signatures, types
   ↓
4. Generates .rst documentation
   ↓
5. Builds HTML automatically

Example Flow:

# Step 1: Write code (config.py)
def ensure_directories() -> None:
    """Create all required output directories.

    This function creates:
    - RESULTS_DIR
    - CLEAN_DATASET_DIR
    - DICTIONARY_JSON_OUTPUT_DIR

    Raises:
        OSError: If directory creation fails
    """
    os.makedirs(RESULTS_DIR, exist_ok=True)
    # ...
# Step 2: Run Sphinx build
cd docs/sphinx && make html

# Step 3: Documentation is automatically generated! βœ…

Current Setup

Sphinx Extensions Enabled

# docs/sphinx/conf.py
extensions = [
    'sphinx.ext.autodoc',          # Auto-generate from docstrings βœ…
    'sphinx.ext.viewcode',         # Link to source code βœ…
    'sphinx.ext.intersphinx',      # Link to external docs βœ…
    'sphinx.ext.napoleon',         # Google/NumPy docstrings βœ…
    'sphinx_autodoc_typehints',    # Render type hints βœ…
]

Auto-Documentation Files

These files use automodule directive to auto-generate content:

docs/sphinx/api/
β”œβ”€β”€ modules.rst                 # Auto-generated module index
β”œβ”€β”€ config.rst                  # Auto-docs for config.py
β”œβ”€β”€ main.rst                    # Auto-docs for main.py
β”œβ”€β”€ scripts.rst                 # Auto-docs for scripts package
β”œβ”€β”€ scripts.deidentify.rst      # Auto-docs for deidentify.py
β”œβ”€β”€ scripts.extract_data.rst    # Auto-docs for extract_data.py
β”œβ”€β”€ scripts.load_dictionary.rst # Auto-docs for load_dictionary.py
└── scripts.utils.*.rst         # Auto-docs for utils modules

Each uses:

.. automodule:: config
   :members:
   :undoc-members:
   :show-inheritance:

Enhancing Automation

πŸš€ Level 1: Watch Mode (AVAILABLE NOW)

Auto-rebuild documentation when files change:

# Install sphinx-autobuild
pip install sphinx-autobuild

# Run in watch mode
cd docs/sphinx
sphinx-autobuild . _build/html

# Opens browser, auto-refreshes on code changes! ✨

Makefile target (add this):

.PHONY: docs-watch
docs-watch:
    @cd docs/sphinx && sphinx-autobuild . _build/html --open-browser

Then just:

make docs-watch

Now whenever you save a Python file with docstrings, the docs rebuild automatically!

πŸš€ Level 3: CI/CD Auto-Deploy (PRODUCTION)

Automatically build and deploy docs on every push:

GitHub Actions Example (.github/workflows/docs.yml):

name: Build and Deploy Docs

on:
  push:
    branches: [main]

jobs:
  build-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.13'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install sphinx sphinx_rtd_theme

      - name: Build documentation
        run: |
          cd docs/sphinx
          make html

      - name: Deploy to GitHub Pages
        uses: peaceiris/actions-gh-pages@v3
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          publish_dir: docs/sphinx/_build/html

Result: Push code β†’ Docs auto-build β†’ Deploy to web! 🌐

πŸš€ Level 4: Docstring Quality Checks (AUTOMATION)

Ensure docstrings exist and are properly formatted:

pydocstyle check:

# Install pydocstyle
pip install pydocstyle

# Check docstring quality
pydocstyle scripts/

Add to pre-commit hook:

#!/bin/bash
# .git/hooks/pre-commit

echo "Checking docstrings..."
pydocstyle scripts/ || {
    echo "❌ Docstring issues found!"
    exit 1
}
echo "βœ… Docstrings OK"

πŸš€ Level 5: Auto-Generate Changelog (ADVANCED)

Auto-generate changelog from commit messages:

Install conventional-changelog:

npm install -g conventional-changelog-cli

# Generate changelog
conventional-changelog -p angular -i CHANGELOG.md -s

Or use Python:

pip install gitchangelog
gitchangelog > docs/sphinx/changelog.rst

Best Practices for Auto-Documentation

Write Good Docstrings

Use Google or NumPy style consistently:

def process_data(input_file: str, options: Dict[str, Any]) -> pd.DataFrame:
    """Process input data file with specified options.

    This function reads an Excel file and applies various transformations
    based on the provided options dictionary.

    Args:
        input_file: Path to input Excel file
        options: Dictionary of processing options with keys:
            - 'validate': bool - Enable validation
            - 'clean': bool - Remove empty rows

    Returns:
        Processed DataFrame with cleaned data

    Raises:
        FileNotFoundError: If input file doesn't exist
        ValueError: If options are invalid

    Example:
        >>> df = process_data('data.xlsx', {'validate': True})
        >>> len(df)
        100

    Note:
        This function modifies data in-place. Make a copy if needed.

    See Also:
        validate_data: Validation function used internally
    """
    # ... implementation

Use Type Hints Everywhere

from typing import Optional, List, Dict, Any

def get_dataset_folder() -> Optional[str]:
    """Get the first dataset folder."""
    # Type hint automatically appears in docs!

Add Module-Level Documentation

"""
Data Extraction Module
======================

This module provides functions for extracting data from Excel files
and converting to JSONL format.

Key Functions:
    - extract_excel_to_jsonl: Main extraction function
    - process_excel_file: Single file processor
    - clean_record_for_json: Data cleaning

Example:
    >>> from scripts.extract_data import extract_excel_to_jsonl
    >>> extract_excel_to_jsonl(input_dir, output_dir)
"""

Use Explicit __all__ Exports

__all__ = [
    'extract_excel_to_jsonl',
    'process_excel_file',
    'clean_record_for_json',
]

Only these appear in from module import * and are prioritized in docs.

Current Workflow

Immediate Auto-Documentation

Right now, you can already do this:

# 1. Write code with docstrings
vim config.py

# 2. Build docs (reads your code automatically)
cd docs/sphinx && make html

# 3. View updated docs
open _build/html/api/config.html

Your docstrings β†’ Instant API docs! βœ…

Implementation Checklist

Quick Wins (Do Now)

☐ Install sphinx-autobuild
☐ Add docs-watch target to Makefile
☐ Create post-commit git hook
☐ Document the workflow for team

Medium Term

☐ Set up GitHub Actions for auto-deploy
☐ Add pydocstyle to pre-commit hooks
☐ Create docstring templates/snippets
☐ Add coverage reports for documentation

Long Term

☐ Auto-generate changelog from commits
☐ Set up Read the Docs hosting
☐ Add API diff detection for breaking changes
☐ Implement version-specific documentation

Summary

You Already Have:

βœ… Auto-documentation from docstrings (autodoc) βœ… Type hints rendering (sphinx-autodoc-typehints) βœ… Cross-references and linking βœ… Multiple output formats (HTML, PDF)

You Can Add:

πŸš€ Watch mode for instant rebuilds πŸš€ Git hooks for automatic updates πŸš€ CI/CD for automatic deployment πŸš€ Quality checks for docstrings πŸš€ Automated changelog generation

The Goal:

Write code β†’ Save file β†’ Docs update automatically ✨

With sphinx-autobuild in watch mode, you’re already 90% there!

External Resources

β€”

TL;DR: Yes! Sphinx already auto-generates API docs from your code. Install sphinx-autobuild for instant updates while you code! πŸš€