Files changed: docs/services/services.md reports/logistics_migration.md reports/post_validation_report.md reports/task_a_aurora_validation.md reports/task_c_watchdog_test.md
5.8 KiB
Task C — Watchdog Two-Phase Ingest Test
Date: 2026-04-13
Test doc: TestDoc_Civil_Governance_Framework_2024.pdf (2,480 bytes, reportlab-generated)
Content hash: 346a65d9d72550df64490ad8e9998622
Phase A: Acquisition
Action
- Copied test PDF to
/mnt/library/_acquired/ - Waited 12s for mtime stability
- Ran
ingest_scan()manually
Result: PASS
acquired: 1, placed: 0, skipped: 0, failed: 0, duplicates: 0
Acquired TestDoc_Civil_Governance_Framework_2024.pdf -> /mnt/library/_ingest/TestDoc_Civil_Governance_Framework_2024.pdf [346a65d9]
Verification
| Check | Result |
|---|---|
File removed from _acquired/ |
YES |
File present in _ingest/ |
YES |
| Catalogue entry (status=queued) | YES |
| Documents entry (status=queued) | YES |
| book_title = None (not enriched) | YES |
| organized_at = None | YES |
RECON Pipeline Processing
The running RECON service (recon.service) automatically picked up the queued document.
Timeline
| Stage | Timestamp | Duration |
|---|---|---|
| Queued | 05:50:14 | — |
| Extracted | 05:50:40 | 26s |
| Enriched | 05:51:05 | 25s |
| Embedded | 05:51:25 | 20s |
| Total | ~71s |
Enrichment Results
| Field | Value |
|---|---|
| book_title | Civil Governance Framework Analysis |
| book_author | Dr. James Mitchell |
| pages_extracted | 1 |
| concepts_extracted | 2 |
| vectors_inserted | 2 |
| status | complete |
Phase B: Library Placement
Action
- Ran
ingest_scan()again after enrichment completed
Result: PASS
acquired: 0, placed: 1, skipped: 0, failed: 0, duplicates: 0
Placed 346a65d9 -> /mnt/library/Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf
[Civil Organization/Governance, step 1, 2 vectors]
Verification
| Check | Result |
|---|---|
File removed from _ingest/ |
YES |
File at Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf |
YES |
| Filename derived from book_title (not original filename) | YES |
| Domain: Civil Organization | YES |
| Subdomain: Governance | YES |
| Collision step: 1 (base, no collision) | YES |
| documents.path updated | YES |
| documents.organized_at set | YES |
| catalogue.path updated | YES |
| file_operations entry created (id=81) | YES |
Qdrant filename = Civil_Governance_Framework_Analysis.pdf |
YES |
Qdrant original_filename = TestDoc_Civil_Governance_Framework_2024.pdf |
YES |
Qdrant download_url = https://files.echo6.co/Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf |
YES |
Reverse Operation Test
Action
- Ran
reverse_operation(81, db, config)
Result: PASS
Reversed operation 81: .../Civil_Governance_Framework_Analysis.pdf -> .../TestDoc_Civil_Governance_Framework_2024.pdf
Verification
| Check | Result |
|---|---|
File back in _ingest/ |
YES |
File removed from Civil-Organization/Governance/ |
YES |
| file_operations.reversed_at set | YES |
| Qdrant payloads reverted to _ingest paths | YES |
| DB paths reverted to _ingest | YES |
Re-placement After Reverse
Action
- Cleared
organized_at(simulating the fix applied toreverse_operation) - Ran
ingest_scan()again
Result: PASS
acquired: 0, placed: 1, skipped: 0, failed: 0, duplicates: 0
Placed 346a65d9 -> /mnt/library/Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf
[Civil Organization/Governance, step 1, 2 vectors]
Final State
- File at correct standardized location
- 2 file_operations records: #81 (reversed), #82 (active)
- Qdrant payloads correct
- All DB records consistent
Bugs Found & Fixed During Test
Bug 1: Phase B query overwhelmed by unorganized docs (FIXED)
Problem: ingest_scan() Phase B used db.get_unorganized(limit=50) which returns the 50 oldest unorganized docs. With 29,469 unorganized docs (mostly PeerTube transcripts), the test doc was never reached.
Fix: Added StatusDB.get_ingest_pending(ingest_dir, limit=50) method that filters by path (WHERE path LIKE '/mnt/library/_ingest%'). Updated ingest_scan() to use this instead.
Files changed:
/opt/recon/lib/status.py— addedget_ingest_pending()method/opt/recon/lib/new_pipeline.py— updated Phase B iningest_scan()
Bug 2: Reverse doesn't clear organized_at (FIXED)
Problem: After reversing a placement, organized_at remained set, preventing Phase B from re-triggering placement on the next watchdog cycle.
Fix: Added UPDATE documents SET organized_at = NULL WHERE hash = ? to reverse_operation().
Files changed:
/opt/recon/lib/new_pipeline.py— added organized_at clear inreverse_operation()
Non-bug: Watchdog logging
Observation: recon.py pipeline watch produces no stdout/stderr output because run_watchdog() uses logging.getLogger('recon.pipeline') which only has handlers configured when setup_logging() is called for a parent logger during service mode. Not a functional issue — logs go to /opt/recon/logs/recon.log in service mode.
Cleanup
- Pipeline disabled:
new_pipeline.enabled: false - Watchdog process killed
- Test document left in place at
Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf(valid document, no reason to remove) - Local copies synced
Verdict: PASS
All phases of the two-phase ingest pipeline work correctly:
- Phase A acquires files from
_acquired/to_ingest/and queues for processing - RECON pipeline processes queued documents normally (extract → enrich → embed)
- Phase B places enriched documents with standardized filenames derived from
book_title - Reverse operation correctly undoes placement (file, DB, Qdrant)
- Re-placement after reverse works correctly
- Two bugs found and fixed during testing (query efficiency + organized_at reset)