# Phase 6c: Code Cleanup ## Objective Remove dead code paths left over from the refactor. Investigation first, deletion second — only remove what's confirmed dead. ## Investigation Findings ### Expected dead code vs reality | Item | Expected status | Actual status | |------|----------------|---------------| | `scanner_loop` | Dead function in recon.py | **Already removed** in Phase 5c-1 | | `peertube_scanner_loop` | Dead function in recon.py | **Already removed** in Phase 5c-1 | | `crawler_scheduler_loop` | Dead function in recon.py | **Already removed** in Phase 5c-1 | | `organizer_loop` | Dead function in recon.py | **Already removed** in Phase 5c-1 | | Extract worker thread | Vestigial in cmd_service() | **Confirmed dead** — 0 items queued, silent 24h+ | | `lib/crawler.py` | Legacy module | **Confirmed dead** — only used by CLI subcommand | | `lib/web_scraper.py` | Legacy module | **ALIVE** — `chunk_text()` used by transcript_processor | | `lib/new_pipeline.py` | Legacy module | **ALIVE** — active Stream B library management tool (1,637 lines, created Apr 13) | | `lib/peertube_scraper.py` | Legacy module | **ALIVE** — only mechanism for transcript ingestion | | `lib/extractor.py` | Dead module | **ALIVE** — used by `cmd_run` CLI for batch processing | ### Additional findings - **24 `.bak` files** found across `/opt/recon/` (untracked, manual pre-edit safety backups from Feb-Apr 2026). All originals preserved in git history. - **File ownership**: All 21 `.py` files + `recon.py` correctly owned by zvx. No corrections needed. - **No TODO/DEPRECATED comments** found in any lib/ file. - **All imports in recon.py** confirmed used (no dead imports at module level). - **PeerTube transcript ingestion** has no automatic mechanism since Phase 5c-1 removed `peertube_scanner_loop`. Ingestion is manual only (CLI or dashboard API endpoint). ## What Was Removed ### recon.py edits (-89 lines, +3 lines) 1. **Extract worker thread** removed from `cmd_service()`: - `from lib.extractor import run_extraction` import - `extract_workers` variable - `'extract': 0` from totals dict - Extract `threading.Thread(target=stage_loop, ...)` from thread list - Extract workers from startup log message 2. **`cmd_crawl` function** deleted (65 lines) — CLI handler for `recon crawl` 3. **Crawl argparse subparser** deleted (15 lines) — `recon crawl` subcommand registration 4. **Docstring** updated to remove `crawl` from subcommand list ### Files deleted | File | Lines | Reason | |------|-------|--------| | `lib/crawler.py` | 432 | Only referenced by deleted `cmd_crawl` CLI subcommand | ### .bak files deleted (24 files, untracked) | File | Size | |------|------| | `recon.py.bak-pre-streamb` | 48K | | `recon.py.bak-pre-ux` | 35K | | `recon.py.bak-pre-crawler` | 35K | | `recon.py.bak.202602171647` | 33K | | `config.yaml.bak-pre-crawler` | 4K | | `config.yaml.bak-pre-streamb` | 13K | | `lib/api.py.bak` + 5 more api.py backups | 498K total | | `lib/embedder.py.bak` | 15K | | `lib/enricher.py.bak` | 17K | | `lib/extractor.py.bak` | 18K | | `lib/status.py.bak-pre-ux` | 10K | | `lib/status.py.bak-pre-streamb` | 13K | | `scripts/validate.py.bak` | 6K | | `scripts/rebuild_qdrant.py.bak` | 6K | | `static/js/dashboard.js.bak` | 11K | | `static/js/peertube.js.bak.20260223` | 5K | | `templates/search.html.bak` | 2K | | `templates/knowledge/dashboard.html.bak` | 3K | ## What Was Kept (and why) | Module | Lines | Why kept | |--------|-------|----------| | `lib/web_scraper.py` | 324 | `transcript_processor.py` imports `chunk_text()` | | `lib/new_pipeline.py` | 1,637 | Active Stream B library management CLI (created Apr 13) | | `lib/peertube_scraper.py` | 580 | Only way to ingest PeerTube transcripts | | `lib/extractor.py` | 601 | Used by `cmd_run` CLI for batch PDF processing | ## Verification | Check | Result | |-------|--------| | Compile (recon.py) | OK | | Import (recon module) | OK | | Import (dispatcher, filing, processors) | OK | | cmd_service assertions | extract worker absent, dispatch_loop present, filing_worker_loop present | | Zero crawler references in .py files | Confirmed | | Service restart | Clean, active | | Thread count | 13 tasks (was 14 — extract removed) | | Threads started | enrich, embed, dispatcher, filing, progress, dashboard, metrics | | Extract thread | Absent (confirmed by logs: no `[extract] Stage started`) | | Errors (60s window) | 0 | | DB rows | catalogue=29,812, documents=29,812 (unchanged) | | Dashboard | Responsive | | Hopper | Empty | ## Commit - **Commit:** `efae402` on `refactor` branch - **Diff:** 2 files changed, 3 insertions(+), 521 deletions(-) - **Pushed to:** `forge.echo6.co/matt/recon` (origin/refactor)