refactored-recon/phases/phase-5c1-service-rewire.md
Matt 581f0017f0 Phase 5c-1: service rewire (code only, not started)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 18:33:15 +00:00

111 lines
4.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 5c-1: Service Loop Rewire
**Executed:** 2026-04-14T18:1018:30Z UTC
---
## Backup
| Item | Location | MD5 Hash |
|------|----------|----------|
| recon.db (pre-Phase 5c-1) | CT 130: `/tmp/recon.db.phase5c1.20260414.bak` | `db48369c9fd0937d1b4869196d3fc19b` |
---
## What This Phase Does
Rewires the RECON service loop (`cmd_service()` in `recon.py`) to use the new dispatcher + filing worker architecture built in Phases 24. **Code changes only — the service has NOT been started.** Phase 5c-2 will do the first live run.
Three files modified, zero data changes.
---
## Files Changed
| File | Lines Added | Lines Removed | Changes |
|------|-------------|---------------|---------|
| `lib/dispatcher.py` | 28 | 0 | Added `dispatch_loop()` — service-thread wrapper around `dispatch_once()` |
| `lib/filing.py` | 57 | 0 | Added `filing_worker_loop()` — watches for `status=complete` items in processing/ |
| `recon.py` | 98 | 242 | Rewired `cmd_service()`, removed 4 old loop definitions |
---
## Service Thread List: Before vs After
| # | Before (old) | After (new) | Notes |
|---|-------------|-------------|-------|
| 1 | extract (stage_loop) | dispatcher (dispatch_loop) | **NEW** — scans acquired/ subfolders |
| 2 | enrich (stage_loop) | extract (stage_loop) | **KEPT** — vestigial, will be removed in Phase 6 |
| 3 | embed (stage_loop) | enrich (stage_loop) | **KEPT** |
| 4 | scanner (scanner_loop) | embed (stage_loop) | **KEPT** |
| 5 | peertube (peertube_scanner_loop) | filing (filing_worker_loop) | **NEW** — files completed items to library |
| 6 | crawler (crawler_scheduler_loop) | progress (progress_loop) | **KEPT** |
| 7 | organizer (organizer_loop) | dashboard (start_dashboard) | **KEPT** |
| 8 | progress (progress_loop) | — | — |
| 9 | dashboard (start_dashboard) | — | — |
**Removed threads:**
- `scanner_loop` — scanned library tree for new PDFs; replaced by dispatcher scanning acquired/ dirs
- `peertube_scanner_loop` — ingested PeerTube transcripts; replaced by dispatcher + transcript processor
- `crawler_scheduler_loop` — crawled configured websites; disabled in config, superseded by acquisition module approach
- `organizer_loop` — filed completed docs using old `organize_document()`; replaced by `filing_worker_loop`
**Added threads:**
- `dispatch_loop` — runs `dispatch_once()` every 30s (configurable via `service.dispatch_interval`), dispatching content from `acquired/{subfolder}/` to the appropriate processor
- `filing_worker_loop` — queries for `status='complete' AND organized_at IS NULL AND path LIKE '/opt/recon/data/processing/%'` every 30s (configurable via `service.filing_interval`), files items to `library/Domain/Subdomain/`
**Vestigial extract worker:** The extract stage worker is kept as a no-op safety net. Processors do their own extraction inline, so the extract worker will find nothing to do. Will be removed in Phase 6 cleanup.
---
## Key Design Decisions
### Filing worker safety rail
The filing worker query includes `AND path LIKE '/opt/recon/data/processing/%'` to ensure it only touches items that went through the new dispatcher → processor pipeline. Legacy items at other paths are left alone.
### Resilient loops
Both `dispatch_loop()` and `filing_worker_loop()` catch all exceptions from their inner logic and log + continue. Service threads never raise to the caller, preventing a single processing error from killing the thread.
### Config-driven intervals
New config keys `service.dispatch_interval` and `service.filing_interval` control polling frequency (both default to 30s if not specified in config.yaml).
---
## Verification Results
| Check | Result |
|-------|--------|
| `py_compile recon.py` | OK |
| `py_compile lib/dispatcher.py` | OK |
| `py_compile lib/filing.py` | OK |
| `import dispatch_once, dispatch_loop` | OK |
| `import file_processed_item, filing_worker_loop` | OK |
| `import recon` | OK |
| `scanner_loop` not in cmd_service | OK |
| `peertube_scanner_loop` not in cmd_service | OK |
| `crawler_scheduler_loop` not in cmd_service | OK |
| `organizer_loop` not in cmd_service | OK |
| `dispatch_loop` in cmd_service | OK |
| `filing_worker_loop` in cmd_service | OK |
| recon.service | inactive |
| recon-watchdog.service | inactive |
| Hopper (acquired/stream/) | 2,259 pairs (unchanged) |
| DB counts | 27,553/27,553 (unchanged) |
| Qdrant points | 2,309,260 (unchanged) |
---
## Service NOT Started
This phase is code-only. The service has NOT been started. Phase 5c-2 will:
1. Start the service
2. Watch the dispatcher pick up the 2,259 hopper items
3. Monitor the pipeline processing them through enrich → embed → filing
---
## Commit
| Repo | Branch | Hash | Message |
|------|--------|------|---------|
| matt/recon | refactor | `d9aed35` | Phase 5c-1: dispatcher loop, filing worker loop, service rewire |