mirror of
https://github.com/zvx-echo6/refactored-recon.git
synced 2026-05-20 14:44:39 +02:00
Phase 5c-1: service rewire (code only, not started)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
6b49d4c107
commit
581f0017f0
1 changed files with 111 additions and 0 deletions
111
phases/phase-5c1-service-rewire.md
Normal file
111
phases/phase-5c1-service-rewire.md
Normal file
|
|
@ -0,0 +1,111 @@
|
|||
# Phase 5c-1: Service Loop Rewire
|
||||
|
||||
**Executed:** 2026-04-14T18:10–18:30Z UTC
|
||||
|
||||
---
|
||||
|
||||
## Backup
|
||||
|
||||
| Item | Location | MD5 Hash |
|
||||
|------|----------|----------|
|
||||
| recon.db (pre-Phase 5c-1) | CT 130: `/tmp/recon.db.phase5c1.20260414.bak` | `db48369c9fd0937d1b4869196d3fc19b` |
|
||||
|
||||
---
|
||||
|
||||
## What This Phase Does
|
||||
|
||||
Rewires the RECON service loop (`cmd_service()` in `recon.py`) to use the new dispatcher + filing worker architecture built in Phases 2–4. **Code changes only — the service has NOT been started.** Phase 5c-2 will do the first live run.
|
||||
|
||||
Three files modified, zero data changes.
|
||||
|
||||
---
|
||||
|
||||
## Files Changed
|
||||
|
||||
| File | Lines Added | Lines Removed | Changes |
|
||||
|------|-------------|---------------|---------|
|
||||
| `lib/dispatcher.py` | 28 | 0 | Added `dispatch_loop()` — service-thread wrapper around `dispatch_once()` |
|
||||
| `lib/filing.py` | 57 | 0 | Added `filing_worker_loop()` — watches for `status=complete` items in processing/ |
|
||||
| `recon.py` | 98 | 242 | Rewired `cmd_service()`, removed 4 old loop definitions |
|
||||
|
||||
---
|
||||
|
||||
## Service Thread List: Before vs After
|
||||
|
||||
| # | Before (old) | After (new) | Notes |
|
||||
|---|-------------|-------------|-------|
|
||||
| 1 | extract (stage_loop) | dispatcher (dispatch_loop) | **NEW** — scans acquired/ subfolders |
|
||||
| 2 | enrich (stage_loop) | extract (stage_loop) | **KEPT** — vestigial, will be removed in Phase 6 |
|
||||
| 3 | embed (stage_loop) | enrich (stage_loop) | **KEPT** |
|
||||
| 4 | scanner (scanner_loop) | embed (stage_loop) | **KEPT** |
|
||||
| 5 | peertube (peertube_scanner_loop) | filing (filing_worker_loop) | **NEW** — files completed items to library |
|
||||
| 6 | crawler (crawler_scheduler_loop) | progress (progress_loop) | **KEPT** |
|
||||
| 7 | organizer (organizer_loop) | dashboard (start_dashboard) | **KEPT** |
|
||||
| 8 | progress (progress_loop) | — | — |
|
||||
| 9 | dashboard (start_dashboard) | — | — |
|
||||
|
||||
**Removed threads:**
|
||||
- `scanner_loop` — scanned library tree for new PDFs; replaced by dispatcher scanning acquired/ dirs
|
||||
- `peertube_scanner_loop` — ingested PeerTube transcripts; replaced by dispatcher + transcript processor
|
||||
- `crawler_scheduler_loop` — crawled configured websites; disabled in config, superseded by acquisition module approach
|
||||
- `organizer_loop` — filed completed docs using old `organize_document()`; replaced by `filing_worker_loop`
|
||||
|
||||
**Added threads:**
|
||||
- `dispatch_loop` — runs `dispatch_once()` every 30s (configurable via `service.dispatch_interval`), dispatching content from `acquired/{subfolder}/` to the appropriate processor
|
||||
- `filing_worker_loop` — queries for `status='complete' AND organized_at IS NULL AND path LIKE '/opt/recon/data/processing/%'` every 30s (configurable via `service.filing_interval`), files items to `library/Domain/Subdomain/`
|
||||
|
||||
**Vestigial extract worker:** The extract stage worker is kept as a no-op safety net. Processors do their own extraction inline, so the extract worker will find nothing to do. Will be removed in Phase 6 cleanup.
|
||||
|
||||
---
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
### Filing worker safety rail
|
||||
The filing worker query includes `AND path LIKE '/opt/recon/data/processing/%'` to ensure it only touches items that went through the new dispatcher → processor pipeline. Legacy items at other paths are left alone.
|
||||
|
||||
### Resilient loops
|
||||
Both `dispatch_loop()` and `filing_worker_loop()` catch all exceptions from their inner logic and log + continue. Service threads never raise to the caller, preventing a single processing error from killing the thread.
|
||||
|
||||
### Config-driven intervals
|
||||
New config keys `service.dispatch_interval` and `service.filing_interval` control polling frequency (both default to 30s if not specified in config.yaml).
|
||||
|
||||
---
|
||||
|
||||
## Verification Results
|
||||
|
||||
| Check | Result |
|
||||
|-------|--------|
|
||||
| `py_compile recon.py` | OK |
|
||||
| `py_compile lib/dispatcher.py` | OK |
|
||||
| `py_compile lib/filing.py` | OK |
|
||||
| `import dispatch_once, dispatch_loop` | OK |
|
||||
| `import file_processed_item, filing_worker_loop` | OK |
|
||||
| `import recon` | OK |
|
||||
| `scanner_loop` not in cmd_service | OK |
|
||||
| `peertube_scanner_loop` not in cmd_service | OK |
|
||||
| `crawler_scheduler_loop` not in cmd_service | OK |
|
||||
| `organizer_loop` not in cmd_service | OK |
|
||||
| `dispatch_loop` in cmd_service | OK |
|
||||
| `filing_worker_loop` in cmd_service | OK |
|
||||
| recon.service | inactive |
|
||||
| recon-watchdog.service | inactive |
|
||||
| Hopper (acquired/stream/) | 2,259 pairs (unchanged) |
|
||||
| DB counts | 27,553/27,553 (unchanged) |
|
||||
| Qdrant points | 2,309,260 (unchanged) |
|
||||
|
||||
---
|
||||
|
||||
## Service NOT Started
|
||||
|
||||
This phase is code-only. The service has NOT been started. Phase 5c-2 will:
|
||||
1. Start the service
|
||||
2. Watch the dispatcher pick up the 2,259 hopper items
|
||||
3. Monitor the pipeline processing them through enrich → embed → filing
|
||||
|
||||
---
|
||||
|
||||
## Commit
|
||||
|
||||
| Repo | Branch | Hash | Message |
|
||||
|------|--------|------|---------|
|
||||
| matt/recon | refactor | `d9aed35` | Phase 5c-1: dispatcher loop, filing worker loop, service rewire |
|
||||
Loading…
Add table
Add a link
Reference in a new issue