Commit graph

3 commits

Author SHA1 Message Date
efae4023f6 Phase 6c: remove vestigial extract worker, dead crawler, .bak files
recon.py:
- Remove extract stage_loop thread from cmd_service(). Confirmed
  vestigial: 0 queued items, silent logs over 24+ hour run. The new
  processors do extraction inline in pre_flight().
- Remove cmd_crawl CLI subcommand and its argparse registration.
- Clean up associated imports and variables.

Deleted:
- lib/crawler.py (432 lines) -- old web crawler subsystem, only
  referenced by the removed CLI subcommand.
- 24 .bak files (untracked pre-edit safety backups, originals
  preserved in git history).

Investigation found the four old loop function definitions
(scanner_loop, peertube_scanner_loop, crawler_scheduler_loop,
organizer_loop) were already deleted in Phase 5c-1.

Modules investigated and KEPT:
- lib/web_scraper.py -- exports chunk_text() used by transcript_processor
- lib/new_pipeline.py -- active Stream B library management CLI tool
- lib/peertube_scraper.py -- only mechanism for transcript ingestion
- lib/extractor.py -- would activate for new PDFs via cmd_run CLI

Service restart verified: 6 threads (dispatcher, enrich, embed,
filing, progress, dashboard), no extract worker, zero errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 23:46:00 +00:00
d9aed35fd7 Phase 5c-1: dispatcher loop, filing worker loop, service rewire
Adds dispatch_loop() alongside dispatch_once() for service-thread use.
Adds filing_worker_loop() that watches for status=complete items in
/opt/recon/data/processing/ and files them to library/Domain/Subdomain/.

Rewires cmd_service() to run the new architecture:
- Removed: scanner_loop, peertube_scanner_loop, crawler_scheduler_loop,
  organizer_loop (all replaced by dispatcher + new filing worker)
- Kept: enrich and embed stage workers, progress, dashboard
- Kept (vestigial): extract stage worker — will be removed in Phase 6 cleanup
- Added: dispatcher loop thread, filing worker thread

Phase 5c-1 of the refactor. Service not yet started — Phase 5c-2 will do that.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 18:30:58 +00:00
563c16bb71 Initial commit: RECON codebase baseline
Current state of the pipeline code as of 2026-04-14 (Phase 1 scaffolding complete).
Config has new_pipeline.enabled=false and crawler.sites=[] per refactor plan.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 14:57:23 +00:00