mirror of
https://github.com/zvx-echo6/recon.git
synced 2026-05-20 06:34:40 +02:00
Docs: domain assignment guide, migration runbook, blast radius
- domain-assignment.md: algorithm walkthrough (pass 1/2/3), status values, CLI command reference, dashboard review guide - migration-runbook.md: step-by-step deploy with pre-deploy backups, 8 STOP pause points for operator verification, staged push rollout, quarantined --reprocess-missing procedure, 5 rollback procedures - deploy-blast-radius.md: per-step risk reference with worst case, detection signals, rollback procedures, and risk tiers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
d1270be64d
commit
a39ec56620
3 changed files with 557 additions and 0 deletions
20
docs/deploy-blast-radius.md
Normal file
20
docs/deploy-blast-radius.md
Normal file
|
|
@ -0,0 +1,20 @@
|
||||||
|
# Deploy Blast Radius Reference
|
||||||
|
|
||||||
|
Quick-reference for operators during deployment of domain categorization.
|
||||||
|
|
||||||
|
| Step | What Changes | Worst Case (Partial Failure) | Detection Signal | Rollback | Est. Rollback Time |
|
||||||
|
|------|-------------|------------------------------|-----------------|----------|-------------------|
|
||||||
|
| **Plugin install** | PeerTube plugin dir on CT 110 | PeerTube fails to start | `systemctl status peertube` shows failed | Move plugin dir to `.disabled`, restart PeerTube | 2 min |
|
||||||
|
| **PeerTube restart** | PeerTube service state | PeerTube crash loop | `journalctl -u peertube` shows repeated failures | Disable plugin, restart | 2 min |
|
||||||
|
| **Schema migration** (RECON restart) | 4 new nullable columns + 1 index in recon.db | Migration SQL error leaves partial columns | Python PRAGMA check fails | DROP COLUMN for each added column | 5 min |
|
||||||
|
| **--backfill** | `recon_domain` + `recon_domain_status` on ~22K rows | Wrong domain assignments | Spot-check 20 random docs | `UPDATE documents SET recon_domain = NULL, recon_domain_status = NULL ...` | 1 min |
|
||||||
|
| **--tiebreaker-pass** | ~1,100 rows: `tied_pass_1` to `tied_pass_2`/`tied_manual` | Wrong tiebreaker resolution | Spot-check 5 resolved items | Reset `tied_pass_2`/`tied_manual` back to `tied_pass_1` | 1 min |
|
||||||
|
| **--push-pending** | PeerTube `video.category` column on ~22K rows | Wrong categories visible to all PeerTube users | PeerTube UI shows wrong labels | `UPDATE video SET category = NULL WHERE category >= 100` + clear push timestamps | 2 min |
|
||||||
|
| **--reprocess-missing** | **DELETES** concept directories (irreversible locally) | Concepts deleted, re-enrichment fails (Gemini API down, quota hit) | `recon.py status` shows stuck `queued` items, concept dirs missing | Restore from Contabo backup (`rsync`) | 10-60 min depending on count |
|
||||||
|
|
||||||
|
## Risk Tiers
|
||||||
|
|
||||||
|
- **Low risk (read-only):** `--dry-run` on any command, status display
|
||||||
|
- **Medium risk (DB-only, reversible):** `--backfill`, `--tiebreaker-pass`, schema migration
|
||||||
|
- **High risk (external writes):** `--push-pending` (writes to PeerTube, visible to users)
|
||||||
|
- **Critical risk (destructive):** `--reprocess-missing` (deletes concept files, $130+ Gemini work at risk)
|
||||||
111
docs/domain-assignment.md
Normal file
111
docs/domain-assignment.md
Normal file
|
|
@ -0,0 +1,111 @@
|
||||||
|
# Domain Assignment — Algorithm & Operations Guide
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
RECON's domain assignment feature maps each PeerTube video to one of 18 knowledge domains by analyzing the concepts extracted from its transcript. Assignments are pushed to PeerTube as category metadata via a custom plugin.
|
||||||
|
|
||||||
|
## Algorithm
|
||||||
|
|
||||||
|
### Pass 1: Concept Domain Count (inline, per-document)
|
||||||
|
|
||||||
|
Runs automatically via post-embed hook when a video completes the pipeline, or in bulk via `--backfill`.
|
||||||
|
|
||||||
|
1. Read all `data/concepts/{hash}/window_*.json` files
|
||||||
|
2. Count domain occurrences across all concepts, filtering to `VALID_DOMAINS` only (skips legacy domains)
|
||||||
|
3. If no valid concepts → `needs_reprocess`
|
||||||
|
4. If single top domain → `assigned`
|
||||||
|
5. If tied → `tied_pass_1` (deferred to tiebreaker)
|
||||||
|
|
||||||
|
### Pass 2: Channel Tiebreaker (batch)
|
||||||
|
|
||||||
|
Runs via `assign-categories --tiebreaker-pass`.
|
||||||
|
|
||||||
|
For each `tied_pass_1` document:
|
||||||
|
|
||||||
|
1. Identify the tied domains
|
||||||
|
2. Look up the document's channel (`catalogue.category`)
|
||||||
|
3. **Mega-channel rule:** If channel has >500 videos, skip tiebreaking → `tied_manual`
|
||||||
|
4. Read concept files for all other videos in the same channel
|
||||||
|
5. Among the tied domains only, pick the one with the highest channel-wide concept count
|
||||||
|
6. If resolved → `tied_pass_2`
|
||||||
|
7. If still tied → proceed to pass 3
|
||||||
|
|
||||||
|
### Pass 3: Defensive Re-Run
|
||||||
|
|
||||||
|
If pass 2 does not resolve the tie, re-read the same channel concept files and re-run identical counting logic. This catches concept-file changes that occurred mid-run (e.g. concurrent enrichment writing new windows during the batch). In steady state, pass 3 produces the same result as pass 2, but under concurrent writes it can resolve a tie that pass 2 missed.
|
||||||
|
|
||||||
|
- If resolved → `tied_pass_2` (same status — the column tracks "channel scan resolved it")
|
||||||
|
- If still tied → `tied_manual` (alphabetical fallback assigned, flagged for review)
|
||||||
|
|
||||||
|
### Mega-Channel Rule
|
||||||
|
|
||||||
|
Channels with >500 videos (like the "Transcript" catch-all with ~9,200 videos) are not topically coherent. Scanning their concepts produces meaningless aggregate data. These go straight to `tied_manual` for dashboard review.
|
||||||
|
|
||||||
|
## Status Values
|
||||||
|
|
||||||
|
| Status | Meaning | Next Action |
|
||||||
|
|--------|---------|-------------|
|
||||||
|
| `assigned` | Clear winner from pass 1 | Push to PeerTube |
|
||||||
|
| `tied_pass_1` | Concept tie, awaiting tiebreaker | Run `--tiebreaker-pass` |
|
||||||
|
| `tied_pass_2` | Resolved by channel tiebreaker | Push to PeerTube |
|
||||||
|
| `tied_manual` | Needs human review | Review at `/peertube/review` |
|
||||||
|
| `needs_reprocess` | Missing concepts or only legacy domains | Run `--reprocess-missing` |
|
||||||
|
| `manual_assigned` | Human override from dashboard | Already pushed |
|
||||||
|
|
||||||
|
**"Categorized" filter** = `{'assigned', 'tied_pass_2', 'manual_assigned'}`
|
||||||
|
|
||||||
|
## CLI Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /opt/recon && source venv/bin/activate
|
||||||
|
|
||||||
|
# Show current assignment status
|
||||||
|
python3 recon.py assign-categories
|
||||||
|
|
||||||
|
# Pass 1: backfill all unassigned complete stream documents
|
||||||
|
python3 recon.py assign-categories --backfill --dry-run
|
||||||
|
python3 recon.py assign-categories --backfill
|
||||||
|
|
||||||
|
# Pass 2: resolve ties via channel analysis
|
||||||
|
python3 recon.py assign-categories --tiebreaker-pass
|
||||||
|
|
||||||
|
# Push all assigned-but-unpushed categories to PeerTube API
|
||||||
|
python3 recon.py assign-categories --push-pending
|
||||||
|
|
||||||
|
# Re-queue items with missing/legacy concepts
|
||||||
|
python3 recon.py assign-categories --reprocess-missing
|
||||||
|
|
||||||
|
# Limit processing count
|
||||||
|
python3 recon.py assign-categories --backfill --limit 100
|
||||||
|
```
|
||||||
|
|
||||||
|
## Dashboard Review
|
||||||
|
|
||||||
|
The review UI at `recon.echo6.co/peertube/review` shows only `tied_manual` items. Each row displays:
|
||||||
|
- Video title and channel
|
||||||
|
- Top concept domains with counts
|
||||||
|
- Dropdown to select the correct domain
|
||||||
|
- Assign button (pushes to PeerTube immediately)
|
||||||
|
|
||||||
|
Items with `needs_reprocess` status do NOT appear in the review UI — they are handled exclusively via the CLI `--reprocess-missing` command.
|
||||||
|
|
||||||
|
## Pipeline Integration
|
||||||
|
|
||||||
|
New videos ingested via the PeerTube collector are automatically assigned a domain when they complete the embed stage. The post-embed hook in `embedder.py`:
|
||||||
|
|
||||||
|
1. Runs `compute_assignment()` (pass 1 only)
|
||||||
|
2. If clear winner: pushes category to PeerTube immediately
|
||||||
|
3. If tied: marks as `tied_pass_1` for the next tiebreaker batch run
|
||||||
|
4. On error: logs warning and continues — does not block the pipeline
|
||||||
|
|
||||||
|
## Source Files
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `lib/recon_domains.py` | Domain↔Category ID mapping, VALID_DOMAINS |
|
||||||
|
| `lib/domain_assigner.py` | `compute_assignment()` + `run_tiebreaker_pass()` |
|
||||||
|
| `lib/peertube_writer.py` | OAuth2 client, `push_category()`, `push_pending()` |
|
||||||
|
| `lib/embedder.py` | Post-embed hook |
|
||||||
|
| `lib/status.py` | DB columns + helper methods |
|
||||||
|
| `lib/api.py` | Dashboard review routes |
|
||||||
|
| `recon.py` | CLI `assign-categories` command |
|
||||||
426
docs/migration-runbook.md
Normal file
426
docs/migration-runbook.md
Normal file
|
|
@ -0,0 +1,426 @@
|
||||||
|
# Domain Categorization Migration Runbook
|
||||||
|
|
||||||
|
Step-by-step procedure to deploy the PeerTube domain categorization feature.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Feature branch `feature/peertube-domain-categorization` merged to master (or checked out)
|
||||||
|
- SSH access to recon-vm (192.168.1.130) and CT 110 (192.168.1.170)
|
||||||
|
- PeerTube admin credentials (`root` / password in `.env`)
|
||||||
|
|
||||||
|
## Pre-Deploy Backups
|
||||||
|
|
||||||
|
These backups MUST be completed before any state-changing step.
|
||||||
|
|
||||||
|
### 1. Snapshot RECON database
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh zvx@192.168.1.130
|
||||||
|
cp /opt/recon/data/recon.db "/opt/recon/data/recon.db.pre-domain-feature.$(date +%Y%m%d_%H%M%S).bak"
|
||||||
|
ls -la /opt/recon/data/recon.db.pre-domain-feature.*.bak # Confirm
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Snapshot PeerTube PostgreSQL
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh root@192.168.1.243 'pct exec 110 -- sudo -u postgres pg_dump peertube_prod' > "/tmp/peertube_prod.pre-domain-feature.$(date +%Y%m%d_%H%M%S).sql"
|
||||||
|
ls -la /tmp/peertube_prod.pre-domain-feature.*.sql # Confirm non-zero
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Verify off-site concept backup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check last rsync to Contabo
|
||||||
|
ssh zvx@192.168.1.130 'ls -la /opt/recon/data/concepts/ | tail -5'
|
||||||
|
ssh root@100.64.0.1 'ls -la /opt/recon-backup/concepts/ | tail -5'
|
||||||
|
# Confirm timestamps match within 6 hours
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Confirm RECON service state
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh zvx@192.168.1.130 'sudo systemctl status recon --no-pager'
|
||||||
|
# Note: do NOT restart until Step 3. If currently running, confirm no active
|
||||||
|
# enrichment/embedding workers before proceeding.
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Step 1: Deploy PeerTube Plugin to CT 110
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# From recon-vm, copy plugin to CT 110
|
||||||
|
ssh zvx@192.168.1.130
|
||||||
|
cd /opt/recon/peertube-plugin/
|
||||||
|
scp -r peertube-plugin-recon-domains root@192.168.1.241:'pct exec 110 -- mkdir -p /var/www/peertube/storage/plugins/node_modules/peertube-plugin-recon-domains'
|
||||||
|
|
||||||
|
# Or via the Proxmox host:
|
||||||
|
ssh root@192.168.1.243 # media host
|
||||||
|
pct exec 110 -- bash -c 'mkdir -p /var/www/peertube/storage/plugins/node_modules/peertube-plugin-recon-domains'
|
||||||
|
# Copy files into the container (scp from recon-vm or use pct push)
|
||||||
|
```
|
||||||
|
|
||||||
|
Alternative: Install via PeerTube admin UI (Admin > Plugins > Install).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Restart PeerTube to register plugin
|
||||||
|
ssh root@192.168.1.243 'pct exec 110 -- systemctl restart peertube'
|
||||||
|
```
|
||||||
|
|
||||||
|
**STOP.** Check PeerTube logs for plugin registration errors:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh root@192.168.1.243 'pct exec 110 -- journalctl -u peertube --since=-5min' | grep -i plugin
|
||||||
|
```
|
||||||
|
|
||||||
|
If any errors reference `peertube-plugin-recon-domains`, do NOT proceed. Diagnose
|
||||||
|
and fix the plugin before continuing. See Rollback: "Plugin install fails" below.
|
||||||
|
|
||||||
|
## Step 2: Verify Plugin
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# From recon-vm
|
||||||
|
curl -s http://192.168.1.170:9000/api/v1/videos/categories -H "Host: stream.echo6.co" | python3 -m json.tool | grep -E '"1[0-1][0-9]"'
|
||||||
|
```
|
||||||
|
|
||||||
|
Should show all 18 categories (IDs 100-117). If any are missing, do NOT proceed.
|
||||||
|
|
||||||
|
Run the parity test:
|
||||||
|
```bash
|
||||||
|
cd /opt/recon && source venv/bin/activate
|
||||||
|
python3 tests/test_constants_parity.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 3: Apply Schema Migration
|
||||||
|
|
||||||
|
**Requires RECON restart (ask user first).**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo systemctl restart recon
|
||||||
|
```
|
||||||
|
|
||||||
|
The migration runs automatically on startup via `StatusDB._init_db()`. Verify:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /opt/recon && source venv/bin/activate
|
||||||
|
python3 -c "
|
||||||
|
from lib.status import StatusDB
|
||||||
|
db = StatusDB()
|
||||||
|
conn = db._get_conn()
|
||||||
|
cols = [r[1] for r in conn.execute('PRAGMA table_info(documents)').fetchall()]
|
||||||
|
for c in ['recon_domain', 'recon_domain_status', 'recon_domain_assigned_at', 'peertube_category_pushed_at']:
|
||||||
|
assert c in cols, f'Missing: {c}'
|
||||||
|
print(f' {c}: OK')
|
||||||
|
|
||||||
|
# Verify index exists
|
||||||
|
indexes = [r[1] for r in conn.execute('PRAGMA index_list(documents)').fetchall()]
|
||||||
|
assert 'idx_documents_recon_domain_status' in indexes, 'Missing index'
|
||||||
|
print(' idx_documents_recon_domain_status: OK')
|
||||||
|
|
||||||
|
# Verify no columns were dropped
|
||||||
|
expected_existing = ['hash', 'status', 'filename', 'discovered_at']
|
||||||
|
for c in expected_existing:
|
||||||
|
assert c in cols, f'ALERT: existing column {c} is missing!'
|
||||||
|
print('Migration verified — all columns present, no existing columns dropped')
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 4: Run Backfill
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /opt/recon && source venv/bin/activate
|
||||||
|
|
||||||
|
# Dry run first
|
||||||
|
python3 recon.py assign-categories --backfill --dry-run
|
||||||
|
```
|
||||||
|
|
||||||
|
**STOP.** Verify dry-run output distribution roughly matches investigation benchmarks:
|
||||||
|
- ~94.8% `assigned` (clear winners)
|
||||||
|
- ~5.2% `tied_pass_1` (ties)
|
||||||
|
- ~19.5% `needs_reprocess` (missing/legacy concepts)
|
||||||
|
|
||||||
|
If the distribution deviates more than 5 percentage points from these benchmarks,
|
||||||
|
halt and investigate. Do not proceed until the deviation is explained.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Execute pass 1
|
||||||
|
python3 recon.py assign-categories --backfill
|
||||||
|
```
|
||||||
|
|
||||||
|
**STOP.** Spot-check 20 random assigned documents:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 -c "
|
||||||
|
from lib.status import StatusDB
|
||||||
|
db = StatusDB()
|
||||||
|
rows = db._get_conn().execute(
|
||||||
|
\"SELECT d.hash, d.recon_domain FROM documents d WHERE d.recon_domain_status = 'assigned' ORDER BY RANDOM() LIMIT 20\"
|
||||||
|
).fetchall()
|
||||||
|
for r in rows:
|
||||||
|
print(r['hash'][:12], r['recon_domain'])
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
For each, visually verify against concept files: `ls data/concepts/{hash}/` and
|
||||||
|
spot-check one `window_*.json` to confirm the assigned domain is plausible.
|
||||||
|
Halt if any are wildly wrong. See Rollback: "Clear wrong backfill assignments" below.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run tiebreaker pass
|
||||||
|
python3 recon.py assign-categories --tiebreaker-pass
|
||||||
|
```
|
||||||
|
|
||||||
|
**STOP.** Verify tiebreaker results:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 -c "
|
||||||
|
from lib.status import StatusDB
|
||||||
|
db = StatusDB()
|
||||||
|
c = db.get_domain_status_counts()
|
||||||
|
print('Status breakdown:', c)
|
||||||
|
print()
|
||||||
|
print('tied_pass_2 (resolved):', c.get('tied_pass_2', 0))
|
||||||
|
print('tied_manual (needs review):', c.get('tied_manual', 0))
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
Spot-check 5 `tied_pass_2` items — verify the resolved domain is plausible given
|
||||||
|
the channel's other content.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check overall status
|
||||||
|
python3 recon.py assign-categories
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 5: Push to PeerTube
|
||||||
|
|
||||||
|
Push in stages. Do NOT push all at once.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Dry run: confirm count
|
||||||
|
python3 recon.py assign-categories --push-pending --dry-run
|
||||||
|
|
||||||
|
# Stage 1: push 100 items
|
||||||
|
python3 recon.py assign-categories --push-pending --limit 100
|
||||||
|
```
|
||||||
|
|
||||||
|
**STOP.** Verify in PeerTube UI (stream.echo6.co admin, or via API) that 100 videos
|
||||||
|
now show RECON domain categories. Spot-check 5 videos.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Verify via API: pick a random pushed video
|
||||||
|
python3 -c "
|
||||||
|
from lib.status import StatusDB
|
||||||
|
db = StatusDB()
|
||||||
|
row = db._get_conn().execute(
|
||||||
|
\"SELECT d.recon_domain, c.path FROM documents d LEFT JOIN catalogue c ON d.hash = c.hash WHERE d.peertube_category_pushed_at IS NOT NULL ORDER BY RANDOM() LIMIT 1\"
|
||||||
|
).fetchone()
|
||||||
|
if row:
|
||||||
|
uuid = row['path'].rsplit('/w/', 1)[-1] if row['path'] and '/w/' in row['path'] else '?'
|
||||||
|
print(f'Domain: {row[\"recon_domain\"]} UUID: {uuid}')
|
||||||
|
print(f'Check: curl -s http://192.168.1.170:9000/api/v1/videos/{uuid} -H \"Host: stream.echo6.co\" | python3 -m json.tool | grep category')
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Stage 2: push 1000 items
|
||||||
|
python3 recon.py assign-categories --push-pending --limit 1000
|
||||||
|
```
|
||||||
|
|
||||||
|
**STOP.** Verify via PeerTube database:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh root@192.168.1.243 'pct exec 110 -- sudo -u postgres psql -d peertube_prod -c "SELECT category, count(*) FROM video WHERE category >= 100 GROUP BY category ORDER BY count DESC"'
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Stage 3: push remaining
|
||||||
|
python3 recon.py assign-categories --push-pending
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 6: Verify
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check PeerTube database directly
|
||||||
|
ssh root@192.168.1.243 'pct exec 110 -- sudo -u postgres psql -d peertube_prod -c "SELECT category, count(*) FROM video WHERE category >= 100 GROUP BY category ORDER BY count DESC"'
|
||||||
|
|
||||||
|
# Check uncategorized
|
||||||
|
ssh root@192.168.1.243 'pct exec 110 -- sudo -u postgres psql -d peertube_prod -c "SELECT count(*) FROM video WHERE category IS NULL"'
|
||||||
|
|
||||||
|
# Check RECON status
|
||||||
|
python3 recon.py assign-categories
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 7: Reprocess Missing Items (SEPARATE POST-DEPLOY OPERATION)
|
||||||
|
|
||||||
|
**WARNING:** This step deletes concept directories. It is the only destructive
|
||||||
|
operation in the entire feature. Run it separately from the initial deploy,
|
||||||
|
after all other steps are verified and stable.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Dry run first — review what would be deleted
|
||||||
|
python3 recon.py assign-categories --reprocess-missing --dry-run --limit 10
|
||||||
|
```
|
||||||
|
|
||||||
|
**STOP.** Review output. Verify concept dirs listed are genuinely stale (legacy
|
||||||
|
domains only, or missing concept files). The dry-run reports file counts for
|
||||||
|
each directory that would be deleted.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Small batch
|
||||||
|
python3 recon.py assign-categories --reprocess-missing --limit 10
|
||||||
|
```
|
||||||
|
|
||||||
|
**STOP.** Verify: check that 10 items re-entered the pipeline.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python3 recon.py status # queued count should increase by ~10
|
||||||
|
```
|
||||||
|
|
||||||
|
Wait for pipeline to process them. Verify domain assignment on completion:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check these specific items got re-enriched and assigned
|
||||||
|
python3 recon.py assign-categories
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Scale up
|
||||||
|
python3 recon.py assign-categories --reprocess-missing --limit 100
|
||||||
|
|
||||||
|
# Then unbounded
|
||||||
|
python3 recon.py assign-categories --reprocess-missing
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note on interrupts:** If `--reprocess-missing` is interrupted mid-run, re-running
|
||||||
|
it is safe. Any documents stranded at `status='catalogued'` without being re-queued
|
||||||
|
can be recovered with `recon.py queue --source stream.echo6.co`.
|
||||||
|
|
||||||
|
## Step 8: Dashboard Review
|
||||||
|
|
||||||
|
Navigate to `https://recon.echo6.co/peertube/review` to review `tied_manual` items.
|
||||||
|
Each row shows the video, channel, tied domains, and concept counts. Select the
|
||||||
|
correct domain and click Assign.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rollback Procedures
|
||||||
|
|
||||||
|
### Plugin install fails or breaks PeerTube
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Disable plugin without uninstalling
|
||||||
|
ssh root@192.168.1.243 'pct exec 110 -- bash -c "
|
||||||
|
mv /var/www/peertube/storage/plugins/node_modules/peertube-plugin-recon-domains \
|
||||||
|
/var/www/peertube/storage/plugins/node_modules/peertube-plugin-recon-domains.disabled
|
||||||
|
systemctl restart peertube
|
||||||
|
"'
|
||||||
|
|
||||||
|
# Verify PeerTube is healthy
|
||||||
|
curl -s http://192.168.1.170:9000/api/v1/videos/categories -H "Host: stream.echo6.co" | python3 -m json.tool | head
|
||||||
|
|
||||||
|
# To fully remove: use PeerTube admin UI → Plugins → Uninstall
|
||||||
|
```
|
||||||
|
|
||||||
|
### Schema migration revert (drop new columns)
|
||||||
|
|
||||||
|
Only needed if the columns cause problems. The columns are nullable and have no
|
||||||
|
constraints, so they should be inert.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ssh zvx@192.168.1.130 'cd /opt/recon && source venv/bin/activate && python3 -c "
|
||||||
|
import sqlite3
|
||||||
|
conn = sqlite3.connect(\"data/recon.db\")
|
||||||
|
for col in [\"recon_domain\", \"recon_domain_status\", \"recon_domain_assigned_at\", \"peertube_category_pushed_at\"]:
|
||||||
|
try:
|
||||||
|
conn.execute(f\"ALTER TABLE documents DROP COLUMN {col}\")
|
||||||
|
print(f\"Dropped: {col}\")
|
||||||
|
except Exception as e:
|
||||||
|
print(f\"Skip {col}: {e}\")
|
||||||
|
conn.execute(\"DROP INDEX IF EXISTS idx_documents_recon_domain_status\")
|
||||||
|
conn.commit()
|
||||||
|
print(\"Index dropped\")
|
||||||
|
"'
|
||||||
|
```
|
||||||
|
|
||||||
|
Note: SQLite ALTER TABLE DROP COLUMN requires SQLite 3.35.0+ (2021-03-12).
|
||||||
|
Ubuntu 24.04 ships 3.45.1 — this is fine.
|
||||||
|
|
||||||
|
### Clear wrong backfill assignments (selective or full)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /opt/recon && source venv/bin/activate
|
||||||
|
|
||||||
|
# Clear ALL domain assignments
|
||||||
|
python3 -c "
|
||||||
|
from lib.status import StatusDB
|
||||||
|
db = StatusDB()
|
||||||
|
conn = db._get_conn()
|
||||||
|
conn.execute('''UPDATE documents SET
|
||||||
|
recon_domain = NULL, recon_domain_status = NULL,
|
||||||
|
recon_domain_assigned_at = NULL, peertube_category_pushed_at = NULL''')
|
||||||
|
conn.commit()
|
||||||
|
print('Cleared all domain assignments')
|
||||||
|
"
|
||||||
|
|
||||||
|
# Clear only tiebreaker results (reset to tied_pass_1 for re-run)
|
||||||
|
python3 -c "
|
||||||
|
from lib.status import StatusDB
|
||||||
|
db = StatusDB()
|
||||||
|
conn = db._get_conn()
|
||||||
|
conn.execute('''UPDATE documents SET
|
||||||
|
recon_domain = NULL, recon_domain_status = 'tied_pass_1',
|
||||||
|
recon_domain_assigned_at = NULL
|
||||||
|
WHERE recon_domain_status IN ('tied_pass_2', 'tied_manual')''')
|
||||||
|
conn.commit()
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Clear wrong PeerTube categories
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Reset ALL RECON categories (100+) to NULL in PeerTube
|
||||||
|
ssh root@192.168.1.243 'pct exec 110 -- sudo -u postgres psql -d peertube_prod \
|
||||||
|
-c "UPDATE video SET category = NULL WHERE category >= 100"'
|
||||||
|
|
||||||
|
# Verify
|
||||||
|
ssh root@192.168.1.243 'pct exec 110 -- sudo -u postgres psql -d peertube_prod \
|
||||||
|
-c "SELECT count(*) FROM video WHERE category >= 100"'
|
||||||
|
# Should return 0
|
||||||
|
|
||||||
|
# Also clear RECON pushed timestamps so --push-pending can retry
|
||||||
|
cd /opt/recon && source venv/bin/activate
|
||||||
|
python3 -c "
|
||||||
|
from lib.status import StatusDB
|
||||||
|
db = StatusDB()
|
||||||
|
conn = db._get_conn()
|
||||||
|
conn.execute('UPDATE documents SET peertube_category_pushed_at = NULL WHERE peertube_category_pushed_at IS NOT NULL')
|
||||||
|
conn.commit()
|
||||||
|
print('Cleared push timestamps')
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Restore concepts after failed --reprocess-missing
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Concept backups are on Contabo at /opt/recon-backup/concepts/
|
||||||
|
# Identify which hashes were deleted (check RECON logs)
|
||||||
|
ssh zvx@192.168.1.130 'grep "Deleting concept dir" /opt/recon/logs/recon.log | tail -20'
|
||||||
|
|
||||||
|
# Restore specific hash from Contabo
|
||||||
|
HASH=<hash_from_log>
|
||||||
|
ssh root@100.64.0.1 "tar -cf - -C /opt/recon-backup/concepts/ $HASH" | \
|
||||||
|
ssh zvx@192.168.1.130 "tar -xf - -C /opt/recon/data/concepts/"
|
||||||
|
|
||||||
|
# Restore ALL concepts (nuclear option)
|
||||||
|
ssh root@100.64.0.1 'rsync -av /opt/recon-backup/concepts/ zvx@192.168.1.130:/opt/recon/data/concepts/'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fully remove feature
|
||||||
|
|
||||||
|
1. Uninstall plugin from PeerTube admin UI
|
||||||
|
2. Restart PeerTube
|
||||||
|
3. Revert RECON code changes (`git checkout master`)
|
||||||
|
4. Restart RECON
|
||||||
|
5. Drop schema columns (see above)
|
||||||
|
6. Reset PeerTube categories (see above)
|
||||||
Loading…
Add table
Add a link
Reference in a new issue