recon.py:
- Remove extract stage_loop thread from cmd_service(). Confirmed
vestigial: 0 queued items, silent logs over 24+ hour run. The new
processors do extraction inline in pre_flight().
- Remove cmd_crawl CLI subcommand and its argparse registration.
- Clean up associated imports and variables.
Deleted:
- lib/crawler.py (432 lines) -- old web crawler subsystem, only
referenced by the removed CLI subcommand.
- 24 .bak files (untracked pre-edit safety backups, originals
preserved in git history).
Investigation found the four old loop function definitions
(scanner_loop, peertube_scanner_loop, crawler_scheduler_loop,
organizer_loop) were already deleted in Phase 5c-1.
Modules investigated and KEPT:
- lib/web_scraper.py -- exports chunk_text() used by transcript_processor
- lib/new_pipeline.py -- active Stream B library management CLI tool
- lib/peertube_scraper.py -- only mechanism for transcript ingestion
- lib/extractor.py -- would activate for new PDFs via cmd_run CLI
Service restart verified: 6 threads (dispatcher, enrich, embed,
filing, progress, dashboard), no extract worker, zero errors.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds dispatch_loop() alongside dispatch_once() for service-thread use.
Adds filing_worker_loop() that watches for status=complete items in
/opt/recon/data/processing/ and files them to library/Domain/Subdomain/.
Rewires cmd_service() to run the new architecture:
- Removed: scanner_loop, peertube_scanner_loop, crawler_scheduler_loop,
organizer_loop (all replaced by dispatcher + new filing worker)
- Kept: enrich and embed stage workers, progress, dashboard
- Kept (vestigial): extract stage worker — will be removed in Phase 6 cleanup
- Added: dispatcher loop thread, filing worker thread
Phase 5c-1 of the refactor. Service not yet started — Phase 5c-2 will do that.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Current state of the pipeline code as of 2026-04-14 (Phase 1 scaffolding complete).
Config has new_pipeline.enabled=false and crawler.sites=[] per refactor plan.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>