echo6-docs/reports/task_c_watchdog_test.md
echo6-autocommit abb0bd0b7c auto: docs sync 2026-04-13T12:00:05+00:00
Files changed: docs/services/services.md reports/logistics_migration.md reports/post_validation_report.md reports/task_a_aurora_validation.md reports/task_c_watchdog_test.md
2026-04-13 12:00:05 +00:00

5.8 KiB

Task C — Watchdog Two-Phase Ingest Test

Date: 2026-04-13 Test doc: TestDoc_Civil_Governance_Framework_2024.pdf (2,480 bytes, reportlab-generated) Content hash: 346a65d9d72550df64490ad8e9998622


Phase A: Acquisition

Action

  • Copied test PDF to /mnt/library/_acquired/
  • Waited 12s for mtime stability
  • Ran ingest_scan() manually

Result: PASS

acquired: 1, placed: 0, skipped: 0, failed: 0, duplicates: 0
Acquired TestDoc_Civil_Governance_Framework_2024.pdf -> /mnt/library/_ingest/TestDoc_Civil_Governance_Framework_2024.pdf [346a65d9]

Verification

Check Result
File removed from _acquired/ YES
File present in _ingest/ YES
Catalogue entry (status=queued) YES
Documents entry (status=queued) YES
book_title = None (not enriched) YES
organized_at = None YES

RECON Pipeline Processing

The running RECON service (recon.service) automatically picked up the queued document.

Timeline

Stage Timestamp Duration
Queued 05:50:14
Extracted 05:50:40 26s
Enriched 05:51:05 25s
Embedded 05:51:25 20s
Total ~71s

Enrichment Results

Field Value
book_title Civil Governance Framework Analysis
book_author Dr. James Mitchell
pages_extracted 1
concepts_extracted 2
vectors_inserted 2
status complete

Phase B: Library Placement

Action

  • Ran ingest_scan() again after enrichment completed

Result: PASS

acquired: 0, placed: 1, skipped: 0, failed: 0, duplicates: 0
Placed 346a65d9 -> /mnt/library/Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf
  [Civil Organization/Governance, step 1, 2 vectors]

Verification

Check Result
File removed from _ingest/ YES
File at Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf YES
Filename derived from book_title (not original filename) YES
Domain: Civil Organization YES
Subdomain: Governance YES
Collision step: 1 (base, no collision) YES
documents.path updated YES
documents.organized_at set YES
catalogue.path updated YES
file_operations entry created (id=81) YES
Qdrant filename = Civil_Governance_Framework_Analysis.pdf YES
Qdrant original_filename = TestDoc_Civil_Governance_Framework_2024.pdf YES
Qdrant download_url = https://files.echo6.co/Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf YES

Reverse Operation Test

Action

  • Ran reverse_operation(81, db, config)

Result: PASS

Reversed operation 81: .../Civil_Governance_Framework_Analysis.pdf -> .../TestDoc_Civil_Governance_Framework_2024.pdf

Verification

Check Result
File back in _ingest/ YES
File removed from Civil-Organization/Governance/ YES
file_operations.reversed_at set YES
Qdrant payloads reverted to _ingest paths YES
DB paths reverted to _ingest YES

Re-placement After Reverse

Action

  • Cleared organized_at (simulating the fix applied to reverse_operation)
  • Ran ingest_scan() again

Result: PASS

acquired: 0, placed: 1, skipped: 0, failed: 0, duplicates: 0
Placed 346a65d9 -> /mnt/library/Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf
  [Civil Organization/Governance, step 1, 2 vectors]

Final State

  • File at correct standardized location
  • 2 file_operations records: #81 (reversed), #82 (active)
  • Qdrant payloads correct
  • All DB records consistent

Bugs Found & Fixed During Test

Bug 1: Phase B query overwhelmed by unorganized docs (FIXED)

Problem: ingest_scan() Phase B used db.get_unorganized(limit=50) which returns the 50 oldest unorganized docs. With 29,469 unorganized docs (mostly PeerTube transcripts), the test doc was never reached.

Fix: Added StatusDB.get_ingest_pending(ingest_dir, limit=50) method that filters by path (WHERE path LIKE '/mnt/library/_ingest%'). Updated ingest_scan() to use this instead.

Files changed:

  • /opt/recon/lib/status.py — added get_ingest_pending() method
  • /opt/recon/lib/new_pipeline.py — updated Phase B in ingest_scan()

Bug 2: Reverse doesn't clear organized_at (FIXED)

Problem: After reversing a placement, organized_at remained set, preventing Phase B from re-triggering placement on the next watchdog cycle.

Fix: Added UPDATE documents SET organized_at = NULL WHERE hash = ? to reverse_operation().

Files changed:

  • /opt/recon/lib/new_pipeline.py — added organized_at clear in reverse_operation()

Non-bug: Watchdog logging

Observation: recon.py pipeline watch produces no stdout/stderr output because run_watchdog() uses logging.getLogger('recon.pipeline') which only has handlers configured when setup_logging() is called for a parent logger during service mode. Not a functional issue — logs go to /opt/recon/logs/recon.log in service mode.


Cleanup

  • Pipeline disabled: new_pipeline.enabled: false
  • Watchdog process killed
  • Test document left in place at Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf (valid document, no reason to remove)
  • Local copies synced

Verdict: PASS

All phases of the two-phase ingest pipeline work correctly:

  1. Phase A acquires files from _acquired/ to _ingest/ and queues for processing
  2. RECON pipeline processes queued documents normally (extract → enrich → embed)
  3. Phase B places enriched documents with standardized filenames derived from book_title
  4. Reverse operation correctly undoes placement (file, DB, Qdrant)
  5. Re-placement after reverse works correctly
  6. Two bugs found and fixed during testing (query efficiency + organized_at reset)