refactored-recon/phases/phase-5c1-service-rewire.md
Matt 581f0017f0 Phase 5c-1: service rewire (code only, not started)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 18:33:15 +00:00

4.8 KiB
Raw Permalink Blame History

Phase 5c-1: Service Loop Rewire

Executed: 2026-04-14T18:1018:30Z UTC


Backup

Item Location MD5 Hash
recon.db (pre-Phase 5c-1) CT 130: /tmp/recon.db.phase5c1.20260414.bak db48369c9fd0937d1b4869196d3fc19b

What This Phase Does

Rewires the RECON service loop (cmd_service() in recon.py) to use the new dispatcher + filing worker architecture built in Phases 24. Code changes only — the service has NOT been started. Phase 5c-2 will do the first live run.

Three files modified, zero data changes.


Files Changed

File Lines Added Lines Removed Changes
lib/dispatcher.py 28 0 Added dispatch_loop() — service-thread wrapper around dispatch_once()
lib/filing.py 57 0 Added filing_worker_loop() — watches for status=complete items in processing/
recon.py 98 242 Rewired cmd_service(), removed 4 old loop definitions

Service Thread List: Before vs After

# Before (old) After (new) Notes
1 extract (stage_loop) dispatcher (dispatch_loop) NEW — scans acquired/ subfolders
2 enrich (stage_loop) extract (stage_loop) KEPT — vestigial, will be removed in Phase 6
3 embed (stage_loop) enrich (stage_loop) KEPT
4 scanner (scanner_loop) embed (stage_loop) KEPT
5 peertube (peertube_scanner_loop) filing (filing_worker_loop) NEW — files completed items to library
6 crawler (crawler_scheduler_loop) progress (progress_loop) KEPT
7 organizer (organizer_loop) dashboard (start_dashboard) KEPT
8 progress (progress_loop)
9 dashboard (start_dashboard)

Removed threads:

  • scanner_loop — scanned library tree for new PDFs; replaced by dispatcher scanning acquired/ dirs
  • peertube_scanner_loop — ingested PeerTube transcripts; replaced by dispatcher + transcript processor
  • crawler_scheduler_loop — crawled configured websites; disabled in config, superseded by acquisition module approach
  • organizer_loop — filed completed docs using old organize_document(); replaced by filing_worker_loop

Added threads:

  • dispatch_loop — runs dispatch_once() every 30s (configurable via service.dispatch_interval), dispatching content from acquired/{subfolder}/ to the appropriate processor
  • filing_worker_loop — queries for status='complete' AND organized_at IS NULL AND path LIKE '/opt/recon/data/processing/%' every 30s (configurable via service.filing_interval), files items to library/Domain/Subdomain/

Vestigial extract worker: The extract stage worker is kept as a no-op safety net. Processors do their own extraction inline, so the extract worker will find nothing to do. Will be removed in Phase 6 cleanup.


Key Design Decisions

Filing worker safety rail

The filing worker query includes AND path LIKE '/opt/recon/data/processing/%' to ensure it only touches items that went through the new dispatcher → processor pipeline. Legacy items at other paths are left alone.

Resilient loops

Both dispatch_loop() and filing_worker_loop() catch all exceptions from their inner logic and log + continue. Service threads never raise to the caller, preventing a single processing error from killing the thread.

Config-driven intervals

New config keys service.dispatch_interval and service.filing_interval control polling frequency (both default to 30s if not specified in config.yaml).


Verification Results

Check Result
py_compile recon.py OK
py_compile lib/dispatcher.py OK
py_compile lib/filing.py OK
import dispatch_once, dispatch_loop OK
import file_processed_item, filing_worker_loop OK
import recon OK
scanner_loop not in cmd_service OK
peertube_scanner_loop not in cmd_service OK
crawler_scheduler_loop not in cmd_service OK
organizer_loop not in cmd_service OK
dispatch_loop in cmd_service OK
filing_worker_loop in cmd_service OK
recon.service inactive
recon-watchdog.service inactive
Hopper (acquired/stream/) 2,259 pairs (unchanged)
DB counts 27,553/27,553 (unchanged)
Qdrant points 2,309,260 (unchanged)

Service NOT Started

This phase is code-only. The service has NOT been started. Phase 5c-2 will:

  1. Start the service
  2. Watch the dispatcher pick up the 2,259 hopper items
  3. Monitor the pipeline processing them through enrich → embed → filing

Commit

Repo Branch Hash Message
matt/recon refactor d9aed35 Phase 5c-1: dispatcher loop, filing worker loop, service rewire