From a39ec566201770e10a8b440211adaf80704a57b8 Mon Sep 17 00:00:00 2001 From: Matt Date: Tue, 28 Apr 2026 00:06:49 +0000 Subject: [PATCH] Docs: domain assignment guide, migration runbook, blast radius - domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values, CLI command reference, dashboard review guide - migration-runbook.md: step-by-step deploy with pre-deploy backups, 8 STOP pause points for operator verification, staged push rollout, quarantined --reprocess-missing procedure, 5 rollback procedures - deploy-blast-radius.md: per-step risk reference with worst case, detection signals, rollback procedures, and risk tiers Co-Authored-By: Claude Opus 4.6 --- docs/deploy-blast-radius.md | 20 ++ docs/domain-assignment.md | 111 ++++++++++ docs/migration-runbook.md | 426 ++++++++++++++++++++++++++++++++++++ 3 files changed, 557 insertions(+) create mode 100644 docs/deploy-blast-radius.md create mode 100644 docs/domain-assignment.md create mode 100644 docs/migration-runbook.md diff --git a/docs/deploy-blast-radius.md b/docs/deploy-blast-radius.md new file mode 100644 index 0000000..fd6bc79 --- /dev/null +++ b/docs/deploy-blast-radius.md @@ -0,0 +1,20 @@ +# Deploy Blast Radius Reference + +Quick-reference for operators during deployment of domain categorization. + +| Step | What Changes | Worst Case (Partial Failure) | Detection Signal | Rollback | Est. Rollback Time | +|------|-------------|------------------------------|-----------------|----------|-------------------| +| **Plugin install** | PeerTube plugin dir on CT 110 | PeerTube fails to start | `systemctl status peertube` shows failed | Move plugin dir to `.disabled`, restart PeerTube | 2 min | +| **PeerTube restart** | PeerTube service state | PeerTube crash loop | `journalctl -u peertube` shows repeated failures | Disable plugin, restart | 2 min | +| **Schema migration** (RECON restart) | 4 new nullable columns + 1 index in recon.db | Migration SQL error leaves partial columns | Python PRAGMA check fails | DROP COLUMN for each added column | 5 min | +| **--backfill** | `recon_domain` + `recon_domain_status` on ~22K rows | Wrong domain assignments | Spot-check 20 random docs | `UPDATE documents SET recon_domain = NULL, recon_domain_status = NULL ...` | 1 min | +| **--tiebreaker-pass** | ~1,100 rows: `tied_pass_1` to `tied_pass_2`/`tied_manual` | Wrong tiebreaker resolution | Spot-check 5 resolved items | Reset `tied_pass_2`/`tied_manual` back to `tied_pass_1` | 1 min | +| **--push-pending** | PeerTube `video.category` column on ~22K rows | Wrong categories visible to all PeerTube users | PeerTube UI shows wrong labels | `UPDATE video SET category = NULL WHERE category >= 100` + clear push timestamps | 2 min | +| **--reprocess-missing** | **DELETES** concept directories (irreversible locally) | Concepts deleted, re-enrichment fails (Gemini API down, quota hit) | `recon.py status` shows stuck `queued` items, concept dirs missing | Restore from Contabo backup (`rsync`) | 10-60 min depending on count | + +## Risk Tiers + +- **Low risk (read-only):** `--dry-run` on any command, status display +- **Medium risk (DB-only, reversible):** `--backfill`, `--tiebreaker-pass`, schema migration +- **High risk (external writes):** `--push-pending` (writes to PeerTube, visible to users) +- **Critical risk (destructive):** `--reprocess-missing` (deletes concept files, $130+ Gemini work at risk) diff --git a/docs/domain-assignment.md b/docs/domain-assignment.md new file mode 100644 index 0000000..394a652 --- /dev/null +++ b/docs/domain-assignment.md @@ -0,0 +1,111 @@ +# Domain Assignment — Algorithm & Operations Guide + +## Overview + +RECON's domain assignment feature maps each PeerTube video to one of 18 knowledge domains by analyzing the concepts extracted from its transcript. Assignments are pushed to PeerTube as category metadata via a custom plugin. + +## Algorithm + +### Pass 1: Concept Domain Count (inline, per-document) + +Runs automatically via post-embed hook when a video completes the pipeline, or in bulk via `--backfill`. + +1. Read all `data/concepts/{hash}/window_*.json` files +2. Count domain occurrences across all concepts, filtering to `VALID_DOMAINS` only (skips legacy domains) +3. If no valid concepts → `needs_reprocess` +4. If single top domain → `assigned` +5. If tied → `tied_pass_1` (deferred to tiebreaker) + +### Pass 2: Channel Tiebreaker (batch) + +Runs via `assign-categories --tiebreaker-pass`. + +For each `tied_pass_1` document: + +1. Identify the tied domains +2. Look up the document's channel (`catalogue.category`) +3. **Mega-channel rule:** If channel has >500 videos, skip tiebreaking → `tied_manual` +4. Read concept files for all other videos in the same channel +5. Among the tied domains only, pick the one with the highest channel-wide concept count +6. If resolved → `tied_pass_2` +7. If still tied → proceed to pass 3 + +### Pass 3: Defensive Re-Run + +If pass 2 does not resolve the tie, re-read the same channel concept files and re-run identical counting logic. This catches concept-file changes that occurred mid-run (e.g. concurrent enrichment writing new windows during the batch). In steady state, pass 3 produces the same result as pass 2, but under concurrent writes it can resolve a tie that pass 2 missed. + +- If resolved → `tied_pass_2` (same status — the column tracks "channel scan resolved it") +- If still tied → `tied_manual` (alphabetical fallback assigned, flagged for review) + +### Mega-Channel Rule + +Channels with >500 videos (like the "Transcript" catch-all with ~9,200 videos) are not topically coherent. Scanning their concepts produces meaningless aggregate data. These go straight to `tied_manual` for dashboard review. + +## Status Values + +| Status | Meaning | Next Action | +|--------|---------|-------------| +| `assigned` | Clear winner from pass 1 | Push to PeerTube | +| `tied_pass_1` | Concept tie, awaiting tiebreaker | Run `--tiebreaker-pass` | +| `tied_pass_2` | Resolved by channel tiebreaker | Push to PeerTube | +| `tied_manual` | Needs human review | Review at `/peertube/review` | +| `needs_reprocess` | Missing concepts or only legacy domains | Run `--reprocess-missing` | +| `manual_assigned` | Human override from dashboard | Already pushed | + +**"Categorized" filter** = `{'assigned', 'tied_pass_2', 'manual_assigned'}` + +## CLI Commands + +```bash +cd /opt/recon && source venv/bin/activate + +# Show current assignment status +python3 recon.py assign-categories + +# Pass 1: backfill all unassigned complete stream documents +python3 recon.py assign-categories --backfill --dry-run +python3 recon.py assign-categories --backfill + +# Pass 2: resolve ties via channel analysis +python3 recon.py assign-categories --tiebreaker-pass + +# Push all assigned-but-unpushed categories to PeerTube API +python3 recon.py assign-categories --push-pending + +# Re-queue items with missing/legacy concepts +python3 recon.py assign-categories --reprocess-missing + +# Limit processing count +python3 recon.py assign-categories --backfill --limit 100 +``` + +## Dashboard Review + +The review UI at `recon.echo6.co/peertube/review` shows only `tied_manual` items. Each row displays: +- Video title and channel +- Top concept domains with counts +- Dropdown to select the correct domain +- Assign button (pushes to PeerTube immediately) + +Items with `needs_reprocess` status do NOT appear in the review UI — they are handled exclusively via the CLI `--reprocess-missing` command. + +## Pipeline Integration + +New videos ingested via the PeerTube collector are automatically assigned a domain when they complete the embed stage. The post-embed hook in `embedder.py`: + +1. Runs `compute_assignment()` (pass 1 only) +2. If clear winner: pushes category to PeerTube immediately +3. If tied: marks as `tied_pass_1` for the next tiebreaker batch run +4. On error: logs warning and continues — does not block the pipeline + +## Source Files + +| File | Purpose | +|------|---------| +| `lib/recon_domains.py` | Domain↔Category ID mapping, VALID_DOMAINS | +| `lib/domain_assigner.py` | `compute_assignment()` + `run_tiebreaker_pass()` | +| `lib/peertube_writer.py` | OAuth2 client, `push_category()`, `push_pending()` | +| `lib/embedder.py` | Post-embed hook | +| `lib/status.py` | DB columns + helper methods | +| `lib/api.py` | Dashboard review routes | +| `recon.py` | CLI `assign-categories` command | diff --git a/docs/migration-runbook.md b/docs/migration-runbook.md new file mode 100644 index 0000000..c962874 --- /dev/null +++ b/docs/migration-runbook.md @@ -0,0 +1,426 @@ +# Domain Categorization Migration Runbook + +Step-by-step procedure to deploy the PeerTube domain categorization feature. + +## Prerequisites + +- Feature branch `feature/peertube-domain-categorization` merged to master (or checked out) +- SSH access to recon-vm (192.168.1.130) and CT 110 (192.168.1.170) +- PeerTube admin credentials (`root` / password in `.env`) + +## Pre-Deploy Backups + +These backups MUST be completed before any state-changing step. + +### 1. Snapshot RECON database + +```bash +ssh zvx@192.168.1.130 +cp /opt/recon/data/recon.db "/opt/recon/data/recon.db.pre-domain-feature.$(date +%Y%m%d_%H%M%S).bak" +ls -la /opt/recon/data/recon.db.pre-domain-feature.*.bak # Confirm +``` + +### 2. Snapshot PeerTube PostgreSQL + +```bash +ssh root@192.168.1.243 'pct exec 110 -- sudo -u postgres pg_dump peertube_prod' > "/tmp/peertube_prod.pre-domain-feature.$(date +%Y%m%d_%H%M%S).sql" +ls -la /tmp/peertube_prod.pre-domain-feature.*.sql # Confirm non-zero +``` + +### 3. Verify off-site concept backup + +```bash +# Check last rsync to Contabo +ssh zvx@192.168.1.130 'ls -la /opt/recon/data/concepts/ | tail -5' +ssh root@100.64.0.1 'ls -la /opt/recon-backup/concepts/ | tail -5' +# Confirm timestamps match within 6 hours +``` + +### 4. Confirm RECON service state + +```bash +ssh zvx@192.168.1.130 'sudo systemctl status recon --no-pager' +# Note: do NOT restart until Step 3. If currently running, confirm no active +# enrichment/embedding workers before proceeding. +``` + +--- + +## Step 1: Deploy PeerTube Plugin to CT 110 + +```bash +# From recon-vm, copy plugin to CT 110 +ssh zvx@192.168.1.130 +cd /opt/recon/peertube-plugin/ +scp -r peertube-plugin-recon-domains root@192.168.1.241:'pct exec 110 -- mkdir -p /var/www/peertube/storage/plugins/node_modules/peertube-plugin-recon-domains' + +# Or via the Proxmox host: +ssh root@192.168.1.243 # media host +pct exec 110 -- bash -c 'mkdir -p /var/www/peertube/storage/plugins/node_modules/peertube-plugin-recon-domains' +# Copy files into the container (scp from recon-vm or use pct push) +``` + +Alternative: Install via PeerTube admin UI (Admin > Plugins > Install). + +```bash +# Restart PeerTube to register plugin +ssh root@192.168.1.243 'pct exec 110 -- systemctl restart peertube' +``` + +**STOP.** Check PeerTube logs for plugin registration errors: + +```bash +ssh root@192.168.1.243 'pct exec 110 -- journalctl -u peertube --since=-5min' | grep -i plugin +``` + +If any errors reference `peertube-plugin-recon-domains`, do NOT proceed. Diagnose +and fix the plugin before continuing. See Rollback: "Plugin install fails" below. + +## Step 2: Verify Plugin + +```bash +# From recon-vm +curl -s http://192.168.1.170:9000/api/v1/videos/categories -H "Host: stream.echo6.co" | python3 -m json.tool | grep -E '"1[0-1][0-9]"' +``` + +Should show all 18 categories (IDs 100-117). If any are missing, do NOT proceed. + +Run the parity test: +```bash +cd /opt/recon && source venv/bin/activate +python3 tests/test_constants_parity.py +``` + +## Step 3: Apply Schema Migration + +**Requires RECON restart (ask user first).** + +```bash +sudo systemctl restart recon +``` + +The migration runs automatically on startup via `StatusDB._init_db()`. Verify: + +```bash +cd /opt/recon && source venv/bin/activate +python3 -c " +from lib.status import StatusDB +db = StatusDB() +conn = db._get_conn() +cols = [r[1] for r in conn.execute('PRAGMA table_info(documents)').fetchall()] +for c in ['recon_domain', 'recon_domain_status', 'recon_domain_assigned_at', 'peertube_category_pushed_at']: + assert c in cols, f'Missing: {c}' + print(f' {c}: OK') + +# Verify index exists +indexes = [r[1] for r in conn.execute('PRAGMA index_list(documents)').fetchall()] +assert 'idx_documents_recon_domain_status' in indexes, 'Missing index' +print(' idx_documents_recon_domain_status: OK') + +# Verify no columns were dropped +expected_existing = ['hash', 'status', 'filename', 'discovered_at'] +for c in expected_existing: + assert c in cols, f'ALERT: existing column {c} is missing!' +print('Migration verified — all columns present, no existing columns dropped') +" +``` + +## Step 4: Run Backfill + +```bash +cd /opt/recon && source venv/bin/activate + +# Dry run first +python3 recon.py assign-categories --backfill --dry-run +``` + +**STOP.** Verify dry-run output distribution roughly matches investigation benchmarks: +- ~94.8% `assigned` (clear winners) +- ~5.2% `tied_pass_1` (ties) +- ~19.5% `needs_reprocess` (missing/legacy concepts) + +If the distribution deviates more than 5 percentage points from these benchmarks, +halt and investigate. Do not proceed until the deviation is explained. + +```bash +# Execute pass 1 +python3 recon.py assign-categories --backfill +``` + +**STOP.** Spot-check 20 random assigned documents: + +```bash +python3 -c " +from lib.status import StatusDB +db = StatusDB() +rows = db._get_conn().execute( + \"SELECT d.hash, d.recon_domain FROM documents d WHERE d.recon_domain_status = 'assigned' ORDER BY RANDOM() LIMIT 20\" +).fetchall() +for r in rows: + print(r['hash'][:12], r['recon_domain']) +" +``` + +For each, visually verify against concept files: `ls data/concepts/{hash}/` and +spot-check one `window_*.json` to confirm the assigned domain is plausible. +Halt if any are wildly wrong. See Rollback: "Clear wrong backfill assignments" below. + +```bash +# Run tiebreaker pass +python3 recon.py assign-categories --tiebreaker-pass +``` + +**STOP.** Verify tiebreaker results: + +```bash +python3 -c " +from lib.status import StatusDB +db = StatusDB() +c = db.get_domain_status_counts() +print('Status breakdown:', c) +print() +print('tied_pass_2 (resolved):', c.get('tied_pass_2', 0)) +print('tied_manual (needs review):', c.get('tied_manual', 0)) +" +``` + +Spot-check 5 `tied_pass_2` items — verify the resolved domain is plausible given +the channel's other content. + +```bash +# Check overall status +python3 recon.py assign-categories +``` + +## Step 5: Push to PeerTube + +Push in stages. Do NOT push all at once. + +```bash +# Dry run: confirm count +python3 recon.py assign-categories --push-pending --dry-run + +# Stage 1: push 100 items +python3 recon.py assign-categories --push-pending --limit 100 +``` + +**STOP.** Verify in PeerTube UI (stream.echo6.co admin, or via API) that 100 videos +now show RECON domain categories. Spot-check 5 videos. + +```bash +# Verify via API: pick a random pushed video +python3 -c " +from lib.status import StatusDB +db = StatusDB() +row = db._get_conn().execute( + \"SELECT d.recon_domain, c.path FROM documents d LEFT JOIN catalogue c ON d.hash = c.hash WHERE d.peertube_category_pushed_at IS NOT NULL ORDER BY RANDOM() LIMIT 1\" +).fetchone() +if row: + uuid = row['path'].rsplit('/w/', 1)[-1] if row['path'] and '/w/' in row['path'] else '?' + print(f'Domain: {row[\"recon_domain\"]} UUID: {uuid}') + print(f'Check: curl -s http://192.168.1.170:9000/api/v1/videos/{uuid} -H \"Host: stream.echo6.co\" | python3 -m json.tool | grep category') +" +``` + +```bash +# Stage 2: push 1000 items +python3 recon.py assign-categories --push-pending --limit 1000 +``` + +**STOP.** Verify via PeerTube database: + +```bash +ssh root@192.168.1.243 'pct exec 110 -- sudo -u postgres psql -d peertube_prod -c "SELECT category, count(*) FROM video WHERE category >= 100 GROUP BY category ORDER BY count DESC"' +``` + +```bash +# Stage 3: push remaining +python3 recon.py assign-categories --push-pending +``` + +## Step 6: Verify + +```bash +# Check PeerTube database directly +ssh root@192.168.1.243 'pct exec 110 -- sudo -u postgres psql -d peertube_prod -c "SELECT category, count(*) FROM video WHERE category >= 100 GROUP BY category ORDER BY count DESC"' + +# Check uncategorized +ssh root@192.168.1.243 'pct exec 110 -- sudo -u postgres psql -d peertube_prod -c "SELECT count(*) FROM video WHERE category IS NULL"' + +# Check RECON status +python3 recon.py assign-categories +``` + +## Step 7: Reprocess Missing Items (SEPARATE POST-DEPLOY OPERATION) + +**WARNING:** This step deletes concept directories. It is the only destructive +operation in the entire feature. Run it separately from the initial deploy, +after all other steps are verified and stable. + +```bash +# Dry run first — review what would be deleted +python3 recon.py assign-categories --reprocess-missing --dry-run --limit 10 +``` + +**STOP.** Review output. Verify concept dirs listed are genuinely stale (legacy +domains only, or missing concept files). The dry-run reports file counts for +each directory that would be deleted. + +```bash +# Small batch +python3 recon.py assign-categories --reprocess-missing --limit 10 +``` + +**STOP.** Verify: check that 10 items re-entered the pipeline. + +```bash +python3 recon.py status # queued count should increase by ~10 +``` + +Wait for pipeline to process them. Verify domain assignment on completion: + +```bash +# Check these specific items got re-enriched and assigned +python3 recon.py assign-categories +``` + +```bash +# Scale up +python3 recon.py assign-categories --reprocess-missing --limit 100 + +# Then unbounded +python3 recon.py assign-categories --reprocess-missing +``` + +**Note on interrupts:** If `--reprocess-missing` is interrupted mid-run, re-running +it is safe. Any documents stranded at `status='catalogued'` without being re-queued +can be recovered with `recon.py queue --source stream.echo6.co`. + +## Step 8: Dashboard Review + +Navigate to `https://recon.echo6.co/peertube/review` to review `tied_manual` items. +Each row shows the video, channel, tied domains, and concept counts. Select the +correct domain and click Assign. + +--- + +## Rollback Procedures + +### Plugin install fails or breaks PeerTube + +```bash +# Disable plugin without uninstalling +ssh root@192.168.1.243 'pct exec 110 -- bash -c " + mv /var/www/peertube/storage/plugins/node_modules/peertube-plugin-recon-domains \ + /var/www/peertube/storage/plugins/node_modules/peertube-plugin-recon-domains.disabled + systemctl restart peertube +"' + +# Verify PeerTube is healthy +curl -s http://192.168.1.170:9000/api/v1/videos/categories -H "Host: stream.echo6.co" | python3 -m json.tool | head + +# To fully remove: use PeerTube admin UI → Plugins → Uninstall +``` + +### Schema migration revert (drop new columns) + +Only needed if the columns cause problems. The columns are nullable and have no +constraints, so they should be inert. + +```bash +ssh zvx@192.168.1.130 'cd /opt/recon && source venv/bin/activate && python3 -c " +import sqlite3 +conn = sqlite3.connect(\"data/recon.db\") +for col in [\"recon_domain\", \"recon_domain_status\", \"recon_domain_assigned_at\", \"peertube_category_pushed_at\"]: + try: + conn.execute(f\"ALTER TABLE documents DROP COLUMN {col}\") + print(f\"Dropped: {col}\") + except Exception as e: + print(f\"Skip {col}: {e}\") +conn.execute(\"DROP INDEX IF EXISTS idx_documents_recon_domain_status\") +conn.commit() +print(\"Index dropped\") +"' +``` + +Note: SQLite ALTER TABLE DROP COLUMN requires SQLite 3.35.0+ (2021-03-12). +Ubuntu 24.04 ships 3.45.1 — this is fine. + +### Clear wrong backfill assignments (selective or full) + +```bash +cd /opt/recon && source venv/bin/activate + +# Clear ALL domain assignments +python3 -c " +from lib.status import StatusDB +db = StatusDB() +conn = db._get_conn() +conn.execute('''UPDATE documents SET + recon_domain = NULL, recon_domain_status = NULL, + recon_domain_assigned_at = NULL, peertube_category_pushed_at = NULL''') +conn.commit() +print('Cleared all domain assignments') +" + +# Clear only tiebreaker results (reset to tied_pass_1 for re-run) +python3 -c " +from lib.status import StatusDB +db = StatusDB() +conn = db._get_conn() +conn.execute('''UPDATE documents SET + recon_domain = NULL, recon_domain_status = 'tied_pass_1', + recon_domain_assigned_at = NULL +WHERE recon_domain_status IN ('tied_pass_2', 'tied_manual')''') +conn.commit() +" +``` + +### Clear wrong PeerTube categories + +```bash +# Reset ALL RECON categories (100+) to NULL in PeerTube +ssh root@192.168.1.243 'pct exec 110 -- sudo -u postgres psql -d peertube_prod \ + -c "UPDATE video SET category = NULL WHERE category >= 100"' + +# Verify +ssh root@192.168.1.243 'pct exec 110 -- sudo -u postgres psql -d peertube_prod \ + -c "SELECT count(*) FROM video WHERE category >= 100"' +# Should return 0 + +# Also clear RECON pushed timestamps so --push-pending can retry +cd /opt/recon && source venv/bin/activate +python3 -c " +from lib.status import StatusDB +db = StatusDB() +conn = db._get_conn() +conn.execute('UPDATE documents SET peertube_category_pushed_at = NULL WHERE peertube_category_pushed_at IS NOT NULL') +conn.commit() +print('Cleared push timestamps') +" +``` + +### Restore concepts after failed --reprocess-missing + +```bash +# Concept backups are on Contabo at /opt/recon-backup/concepts/ +# Identify which hashes were deleted (check RECON logs) +ssh zvx@192.168.1.130 'grep "Deleting concept dir" /opt/recon/logs/recon.log | tail -20' + +# Restore specific hash from Contabo +HASH= +ssh root@100.64.0.1 "tar -cf - -C /opt/recon-backup/concepts/ $HASH" | \ + ssh zvx@192.168.1.130 "tar -xf - -C /opt/recon/data/concepts/" + +# Restore ALL concepts (nuclear option) +ssh root@100.64.0.1 'rsync -av /opt/recon-backup/concepts/ zvx@192.168.1.130:/opt/recon/data/concepts/' +``` + +### Fully remove feature + +1. Uninstall plugin from PeerTube admin UI +2. Restart PeerTube +3. Revert RECON code changes (`git checkout master`) +4. Restart RECON +5. Drop schema columns (see above) +6. Reset PeerTube categories (see above)