Site Navigation

For LLMs: zip of all posts.

Edit on GitHub


Plex Archive Project

This project extracts individual posts from the Biweekly Plex Dispatch HTML archives and creates a Massive Wiki for easy navigation and cross-referencing.

Note: There's a tension between tweaking the extract script and re-running it vs. making edits in the extraction output files. At some point this will be resolved by better automation or deciding never to run the extract script again. Until then, check diffs carefully when making changes to either automation or output files.

The extraction script only generates these files/directories:

Project Structure

/
├── posts/           # Individual post markdown files (614 posts)
├── authors/         # Author index pages (53 authors)
├── topics/          # Topic index pages (30 topics)
├── years/           # Year index pages (2022-2025)
├── issues/          # Original HTML source files (87 issues)
├── scripts/         # Python extraction and utility scripts
├── venv/           # Python virtual environment
├── README.md        # Main archive navigation (Massive Wiki entry point)
└── Project.md       # This technical documentation

Scripts

Main Scripts

Requirements

Usage

Setup

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r scripts/requirements.txt

Generate Archive

source venv/bin/activate
python3 scripts/cleanup.py    # Clean previous run
python3 scripts/extract_posts.py

Features

Content Organization

Topic Detection

Uses confidence-based scoring with:

Author Management

Linking

Data Sources

Development

Adding New Topics

Edit COMMON_TOPICS dictionary in scripts/extract_posts.py and regenerate.

Author Name Changes

Update AUTHOR_CONSOLIDATION mapping and KNOWN_AUTHORS set.

Regeneration

Always run scripts/cleanup.py first to remove generated content while preserving source files and scripts.