# Task C — Watchdog Two-Phase Ingest Test **Date:** 2026-04-13 **Test doc:** `TestDoc_Civil_Governance_Framework_2024.pdf` (2,480 bytes, reportlab-generated) **Content hash:** `346a65d9d72550df64490ad8e9998622` --- ## Phase A: Acquisition ### Action - Copied test PDF to `/mnt/library/_acquired/` - Waited 12s for mtime stability - Ran `ingest_scan()` manually ### Result: PASS ``` acquired: 1, placed: 0, skipped: 0, failed: 0, duplicates: 0 Acquired TestDoc_Civil_Governance_Framework_2024.pdf -> /mnt/library/_ingest/TestDoc_Civil_Governance_Framework_2024.pdf [346a65d9] ``` ### Verification | Check | Result | |-------|--------| | File removed from `_acquired/` | YES | | File present in `_ingest/` | YES | | Catalogue entry (status=queued) | YES | | Documents entry (status=queued) | YES | | book_title = None (not enriched) | YES | | organized_at = None | YES | --- ## RECON Pipeline Processing The running RECON service (`recon.service`) automatically picked up the queued document. ### Timeline | Stage | Timestamp | Duration | |-------|-----------|----------| | Queued | 05:50:14 | — | | Extracted | 05:50:40 | 26s | | Enriched | 05:51:05 | 25s | | Embedded | 05:51:25 | 20s | | **Total** | | **~71s** | ### Enrichment Results | Field | Value | |-------|-------| | book_title | Civil Governance Framework Analysis | | book_author | Dr. James Mitchell | | pages_extracted | 1 | | concepts_extracted | 2 | | vectors_inserted | 2 | | status | complete | --- ## Phase B: Library Placement ### Action - Ran `ingest_scan()` again after enrichment completed ### Result: PASS ``` acquired: 0, placed: 1, skipped: 0, failed: 0, duplicates: 0 Placed 346a65d9 -> /mnt/library/Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf [Civil Organization/Governance, step 1, 2 vectors] ``` ### Verification | Check | Result | |-------|--------| | File removed from `_ingest/` | YES | | File at `Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf` | YES | | Filename derived from book_title (not original filename) | YES | | Domain: Civil Organization | YES | | Subdomain: Governance | YES | | Collision step: 1 (base, no collision) | YES | | documents.path updated | YES | | documents.organized_at set | YES | | catalogue.path updated | YES | | file_operations entry created (id=81) | YES | | Qdrant filename = `Civil_Governance_Framework_Analysis.pdf` | YES | | Qdrant original_filename = `TestDoc_Civil_Governance_Framework_2024.pdf` | YES | | Qdrant download_url = `https://files.echo6.co/Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf` | YES | --- ## Reverse Operation Test ### Action - Ran `reverse_operation(81, db, config)` ### Result: PASS ``` Reversed operation 81: .../Civil_Governance_Framework_Analysis.pdf -> .../TestDoc_Civil_Governance_Framework_2024.pdf ``` ### Verification | Check | Result | |-------|--------| | File back in `_ingest/` | YES | | File removed from `Civil-Organization/Governance/` | YES | | file_operations.reversed_at set | YES | | Qdrant payloads reverted to _ingest paths | YES | | DB paths reverted to _ingest | YES | --- ## Re-placement After Reverse ### Action - Cleared `organized_at` (simulating the fix applied to `reverse_operation`) - Ran `ingest_scan()` again ### Result: PASS ``` acquired: 0, placed: 1, skipped: 0, failed: 0, duplicates: 0 Placed 346a65d9 -> /mnt/library/Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf [Civil Organization/Governance, step 1, 2 vectors] ``` ### Final State - File at correct standardized location - 2 file_operations records: #81 (reversed), #82 (active) - Qdrant payloads correct - All DB records consistent --- ## Bugs Found & Fixed During Test ### Bug 1: Phase B query overwhelmed by unorganized docs (FIXED) **Problem:** `ingest_scan()` Phase B used `db.get_unorganized(limit=50)` which returns the 50 oldest unorganized docs. With 29,469 unorganized docs (mostly PeerTube transcripts), the test doc was never reached. **Fix:** Added `StatusDB.get_ingest_pending(ingest_dir, limit=50)` method that filters by path (`WHERE path LIKE '/mnt/library/_ingest%'`). Updated `ingest_scan()` to use this instead. **Files changed:** - `/opt/recon/lib/status.py` — added `get_ingest_pending()` method - `/opt/recon/lib/new_pipeline.py` — updated Phase B in `ingest_scan()` ### Bug 2: Reverse doesn't clear organized_at (FIXED) **Problem:** After reversing a placement, `organized_at` remained set, preventing Phase B from re-triggering placement on the next watchdog cycle. **Fix:** Added `UPDATE documents SET organized_at = NULL WHERE hash = ?` to `reverse_operation()`. **Files changed:** - `/opt/recon/lib/new_pipeline.py` — added organized_at clear in `reverse_operation()` ### Non-bug: Watchdog logging **Observation:** `recon.py pipeline watch` produces no stdout/stderr output because `run_watchdog()` uses `logging.getLogger('recon.pipeline')` which only has handlers configured when `setup_logging()` is called for a parent logger during service mode. Not a functional issue — logs go to `/opt/recon/logs/recon.log` in service mode. --- ## Cleanup - Pipeline disabled: `new_pipeline.enabled: false` - Watchdog process killed - Test document left in place at `Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf` (valid document, no reason to remove) - Local copies synced --- ## Verdict: PASS All phases of the two-phase ingest pipeline work correctly: 1. Phase A acquires files from `_acquired/` to `_ingest/` and queues for processing 2. RECON pipeline processes queued documents normally (extract → enrich → embed) 3. Phase B places enriched documents with standardized filenames derived from `book_title` 4. Reverse operation correctly undoes placement (file, DB, Qdrant) 5. Re-placement after reverse works correctly 6. Two bugs found and fixed during testing (query efficiency + organized_at reset)