refactored-recon/phases/phase-6c-code-cleanup.md
Matt 3a118064ee Phase 6c: code cleanup documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 23:46:56 +00:00

4.6 KiB

Phase 6c: Code Cleanup

Objective

Remove dead code paths left over from the refactor. Investigation first, deletion second — only remove what's confirmed dead.

Investigation Findings

Expected dead code vs reality

Item Expected status Actual status
scanner_loop Dead function in recon.py Already removed in Phase 5c-1
peertube_scanner_loop Dead function in recon.py Already removed in Phase 5c-1
crawler_scheduler_loop Dead function in recon.py Already removed in Phase 5c-1
organizer_loop Dead function in recon.py Already removed in Phase 5c-1
Extract worker thread Vestigial in cmd_service() Confirmed dead — 0 items queued, silent 24h+
lib/crawler.py Legacy module Confirmed dead — only used by CLI subcommand
lib/web_scraper.py Legacy module ALIVEchunk_text() used by transcript_processor
lib/new_pipeline.py Legacy module ALIVE — active Stream B library management tool (1,637 lines, created Apr 13)
lib/peertube_scraper.py Legacy module ALIVE — only mechanism for transcript ingestion
lib/extractor.py Dead module ALIVE — used by cmd_run CLI for batch processing

Additional findings

  • 24 .bak files found across /opt/recon/ (untracked, manual pre-edit safety backups from Feb-Apr 2026). All originals preserved in git history.
  • File ownership: All 21 .py files + recon.py correctly owned by zvx. No corrections needed.
  • No TODO/DEPRECATED comments found in any lib/ file.
  • All imports in recon.py confirmed used (no dead imports at module level).
  • PeerTube transcript ingestion has no automatic mechanism since Phase 5c-1 removed peertube_scanner_loop. Ingestion is manual only (CLI or dashboard API endpoint).

What Was Removed

recon.py edits (-89 lines, +3 lines)

  1. Extract worker thread removed from cmd_service():

    • from lib.extractor import run_extraction import
    • extract_workers variable
    • 'extract': 0 from totals dict
    • Extract threading.Thread(target=stage_loop, ...) from thread list
    • Extract workers from startup log message
  2. cmd_crawl function deleted (65 lines) — CLI handler for recon crawl

  3. Crawl argparse subparser deleted (15 lines) — recon crawl subcommand registration

  4. Docstring updated to remove crawl from subcommand list

Files deleted

File Lines Reason
lib/crawler.py 432 Only referenced by deleted cmd_crawl CLI subcommand

.bak files deleted (24 files, untracked)

File Size
recon.py.bak-pre-streamb 48K
recon.py.bak-pre-ux 35K
recon.py.bak-pre-crawler 35K
recon.py.bak.202602171647 33K
config.yaml.bak-pre-crawler 4K
config.yaml.bak-pre-streamb 13K
lib/api.py.bak + 5 more api.py backups 498K total
lib/embedder.py.bak 15K
lib/enricher.py.bak 17K
lib/extractor.py.bak 18K
lib/status.py.bak-pre-ux 10K
lib/status.py.bak-pre-streamb 13K
scripts/validate.py.bak 6K
scripts/rebuild_qdrant.py.bak 6K
static/js/dashboard.js.bak 11K
static/js/peertube.js.bak.20260223 5K
templates/search.html.bak 2K
templates/knowledge/dashboard.html.bak 3K

What Was Kept (and why)

Module Lines Why kept
lib/web_scraper.py 324 transcript_processor.py imports chunk_text()
lib/new_pipeline.py 1,637 Active Stream B library management CLI (created Apr 13)
lib/peertube_scraper.py 580 Only way to ingest PeerTube transcripts
lib/extractor.py 601 Used by cmd_run CLI for batch PDF processing

Verification

Check Result
Compile (recon.py) OK
Import (recon module) OK
Import (dispatcher, filing, processors) OK
cmd_service assertions extract worker absent, dispatch_loop present, filing_worker_loop present
Zero crawler references in .py files Confirmed
Service restart Clean, active
Thread count 13 tasks (was 14 — extract removed)
Threads started enrich, embed, dispatcher, filing, progress, dashboard, metrics
Extract thread Absent (confirmed by logs: no [extract] Stage started)
Errors (60s window) 0
DB rows catalogue=29,812, documents=29,812 (unchanged)
Dashboard Responsive
Hopper Empty

Commit

  • Commit: efae402 on refactor branch
  • Diff: 2 files changed, 3 insertions(+), 521 deletions(-)
  • Pushed to: forge.echo6.co/matt/recon (origin/refactor)