Documents the initial 5c-2 failure (ignore_errors=True + root-owned legacy files), the recovery procedure (hopper reconstitution, orphan cleanup, processor fix), and the successful retry with pipeline drain in progress.
4.6 KiB
Phase 5c-2: Service Start & Transcript Drain
Objective
Start recon.service with the refactored dispatch loop and drain the 2,259
PeerTube transcripts (unprocessed in Phase 5b) through the full pipeline:
dispatch → extract → enrich → embed → complete.
Initial Attempt (2026-04-14 ~18:42 UTC)
What happened
The service started cleanly — all threads (extract, enrich, embed, dispatcher,
filing, progress, dashboard) came up. The dispatcher found 2,259 .txt +
.meta.json pairs in acquired/stream/ and began dispatching them to
transcript_processor.pre_flight().
All 2,259 items crashed with:
Permission denied: '/opt/recon/data/processing/{hash}/meta.json'
Root cause
The Phase 5b staging script (stage_hopper.py) used shutil.copy2() to copy
meta.json sidecars from /mnt/library/_sources/streamecho6/ into the hopper.
The source files were root-owned, and copy2 preserves permissions — so the
hopper copies were also root-owned.
When transcript_processor.pre_flight() ran:
- It hashed the content file (worked —
.txtwas world-readable) - It ran stale cleanup:
shutil.rmtree(proc_dir, ignore_errors=True) - The
ignore_errors=Truesilently failed to remove root-owned files inside processing dirs that already existed from a previous (legacy) PeerTube scraper run - It tried to read the (now root-owned)
meta.jsoninside the half-cleaned processing dir → Permission denied
The ignore_errors=True was the critical bug — it turned a loud, diagnosable
failure into a silent state corruption that cascaded into a different error
2 steps later.
Resulting state
- 2,259 "dirty" processing dirs with root-owned
meta.json+transcript.txtand zvx-ownedpage_*.txt(from the partially-completed page split) - Zero DB rows for any of the 2,259 items (pre_flight crashed before
add_to_catalogue/queue_document) - 3 "clean" processing dirs from Phase 3/4 test items (intact, with DB rows)
- Hopper drained (all files consumed by dispatcher before crash)
Recovery (2026-04-14 ~20:14 UTC)
Pre-recovery verification
| Check | Expected | Actual |
|---|---|---|
| recon.service | inactive | inactive |
| processing/ dirs | 2,262 | 2,262 |
| acquired/stream/ | empty | empty |
| catalogue rows | 27,553 | 27,553 |
| documents rows | 27,553 | 27,553 |
DB backup: /tmp/recon.db.phase5c2recovery.20260414.bak
(md5: 99031c0994160e3a844375a9a635d616)
Section 1: Classify processing dirs
- 3 PRESERVE — have catalogue rows (Phase 3/4 test items):
172f39ae,380dbc78,93aad72f - 2,259 ORPHAN — no catalogue or documents rows
Section 2: Reconstitute hopper
Copied transcript.txt → {hash}.txt and meta.json → {hash}.meta.json
from each orphan processing dir back into acquired/stream/.
Result: 2,259 .txt + 2,259 .meta.json, all zvx-owned, 0 errors.
Section 3: Delete orphan processing dirs
sudo rm -rf on all 2,259 orphan dirs (sudo required for root-owned files
on local disk). 3 preserved dirs confirmed intact.
Section 4: Fix the bug
Replaced shutil.rmtree(proc_dir, ignore_errors=True) with explicit
try/except that logs and re-raises:
if os.path.exists(proc_dir):
try:
shutil.rmtree(proc_dir)
except Exception as e:
logger.error("Stale cleanup failed for %s: %s", proc_dir, e)
raise
Applied to both transcript_processor.py and pdf_processor.py.
Section 5: Commit
- Commit:
9fa60f9onrefactorbranch - Message: "Fix: stale cleanup in processors must fail loudly on permission errors"
- Pushed to:
forge.echo6.co/matt/recon(origin/refactor)
Section 6: Retry service start
Service started at 20:15:57 UTC. All threads came up cleanly.
Dispatcher consumed all 2,259 hopper items on first cycle (~20:16). Items immediately began flowing through the pipeline:
| Time (UTC) | extracted | enriching | enriched | complete |
|---|---|---|---|---|
| 20:17:51 | 2,137 | 16 | 31 | 27,634 |
| 20:18:51 | 2,061 | 16 | 15 | 27,717 |
| 20:19:51 | 1,987 | 16 | 3 | 27,803 |
| 20:20:51 | 1,917 | 16 | 36 | 27,840 |
| 20:21:51 | 1,849 | 16 | 26 | 27,918 |
Throughput: ~57 items/min completing. Zero errors in 5-minute stability window.
ETA for full drain: ~32 minutes from 20:22 UTC (~20:54 UTC).
Background monitor running (PID 554723), writing to
/tmp/phase5c2recovery_initial_log.txt.
Files Modified
| File | Change |
|---|---|
lib/processors/transcript_processor.py |
Stale cleanup: fail loudly |
lib/processors/pdf_processor.py |
Stale cleanup: fail loudly |