auto: docs sync 2026-04-13T12:00:05+00:00
Files changed: docs/services/services.md reports/logistics_migration.md reports/post_validation_report.md reports/task_a_aurora_validation.md reports/task_c_watchdog_test.md
This commit is contained in:
parent
5f378b1903
commit
abb0bd0b7c
5 changed files with 611 additions and 2 deletions
176
reports/task_c_watchdog_test.md
Normal file
176
reports/task_c_watchdog_test.md
Normal file
|
|
@ -0,0 +1,176 @@
|
|||
# Task C — Watchdog Two-Phase Ingest Test
|
||||
|
||||
**Date:** 2026-04-13
|
||||
**Test doc:** `TestDoc_Civil_Governance_Framework_2024.pdf` (2,480 bytes, reportlab-generated)
|
||||
**Content hash:** `346a65d9d72550df64490ad8e9998622`
|
||||
|
||||
---
|
||||
|
||||
## Phase A: Acquisition
|
||||
|
||||
### Action
|
||||
- Copied test PDF to `/mnt/library/_acquired/`
|
||||
- Waited 12s for mtime stability
|
||||
- Ran `ingest_scan()` manually
|
||||
|
||||
### Result: PASS
|
||||
```
|
||||
acquired: 1, placed: 0, skipped: 0, failed: 0, duplicates: 0
|
||||
Acquired TestDoc_Civil_Governance_Framework_2024.pdf -> /mnt/library/_ingest/TestDoc_Civil_Governance_Framework_2024.pdf [346a65d9]
|
||||
```
|
||||
|
||||
### Verification
|
||||
| Check | Result |
|
||||
|-------|--------|
|
||||
| File removed from `_acquired/` | YES |
|
||||
| File present in `_ingest/` | YES |
|
||||
| Catalogue entry (status=queued) | YES |
|
||||
| Documents entry (status=queued) | YES |
|
||||
| book_title = None (not enriched) | YES |
|
||||
| organized_at = None | YES |
|
||||
|
||||
---
|
||||
|
||||
## RECON Pipeline Processing
|
||||
|
||||
The running RECON service (`recon.service`) automatically picked up the queued document.
|
||||
|
||||
### Timeline
|
||||
| Stage | Timestamp | Duration |
|
||||
|-------|-----------|----------|
|
||||
| Queued | 05:50:14 | — |
|
||||
| Extracted | 05:50:40 | 26s |
|
||||
| Enriched | 05:51:05 | 25s |
|
||||
| Embedded | 05:51:25 | 20s |
|
||||
| **Total** | | **~71s** |
|
||||
|
||||
### Enrichment Results
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| book_title | Civil Governance Framework Analysis |
|
||||
| book_author | Dr. James Mitchell |
|
||||
| pages_extracted | 1 |
|
||||
| concepts_extracted | 2 |
|
||||
| vectors_inserted | 2 |
|
||||
| status | complete |
|
||||
|
||||
---
|
||||
|
||||
## Phase B: Library Placement
|
||||
|
||||
### Action
|
||||
- Ran `ingest_scan()` again after enrichment completed
|
||||
|
||||
### Result: PASS
|
||||
```
|
||||
acquired: 0, placed: 1, skipped: 0, failed: 0, duplicates: 0
|
||||
Placed 346a65d9 -> /mnt/library/Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf
|
||||
[Civil Organization/Governance, step 1, 2 vectors]
|
||||
```
|
||||
|
||||
### Verification
|
||||
| Check | Result |
|
||||
|-------|--------|
|
||||
| File removed from `_ingest/` | YES |
|
||||
| File at `Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf` | YES |
|
||||
| Filename derived from book_title (not original filename) | YES |
|
||||
| Domain: Civil Organization | YES |
|
||||
| Subdomain: Governance | YES |
|
||||
| Collision step: 1 (base, no collision) | YES |
|
||||
| documents.path updated | YES |
|
||||
| documents.organized_at set | YES |
|
||||
| catalogue.path updated | YES |
|
||||
| file_operations entry created (id=81) | YES |
|
||||
| Qdrant filename = `Civil_Governance_Framework_Analysis.pdf` | YES |
|
||||
| Qdrant original_filename = `TestDoc_Civil_Governance_Framework_2024.pdf` | YES |
|
||||
| Qdrant download_url = `https://files.echo6.co/Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf` | YES |
|
||||
|
||||
---
|
||||
|
||||
## Reverse Operation Test
|
||||
|
||||
### Action
|
||||
- Ran `reverse_operation(81, db, config)`
|
||||
|
||||
### Result: PASS
|
||||
```
|
||||
Reversed operation 81: .../Civil_Governance_Framework_Analysis.pdf -> .../TestDoc_Civil_Governance_Framework_2024.pdf
|
||||
```
|
||||
|
||||
### Verification
|
||||
| Check | Result |
|
||||
|-------|--------|
|
||||
| File back in `_ingest/` | YES |
|
||||
| File removed from `Civil-Organization/Governance/` | YES |
|
||||
| file_operations.reversed_at set | YES |
|
||||
| Qdrant payloads reverted to _ingest paths | YES |
|
||||
| DB paths reverted to _ingest | YES |
|
||||
|
||||
---
|
||||
|
||||
## Re-placement After Reverse
|
||||
|
||||
### Action
|
||||
- Cleared `organized_at` (simulating the fix applied to `reverse_operation`)
|
||||
- Ran `ingest_scan()` again
|
||||
|
||||
### Result: PASS
|
||||
```
|
||||
acquired: 0, placed: 1, skipped: 0, failed: 0, duplicates: 0
|
||||
Placed 346a65d9 -> /mnt/library/Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf
|
||||
[Civil Organization/Governance, step 1, 2 vectors]
|
||||
```
|
||||
|
||||
### Final State
|
||||
- File at correct standardized location
|
||||
- 2 file_operations records: #81 (reversed), #82 (active)
|
||||
- Qdrant payloads correct
|
||||
- All DB records consistent
|
||||
|
||||
---
|
||||
|
||||
## Bugs Found & Fixed During Test
|
||||
|
||||
### Bug 1: Phase B query overwhelmed by unorganized docs (FIXED)
|
||||
|
||||
**Problem:** `ingest_scan()` Phase B used `db.get_unorganized(limit=50)` which returns the 50 oldest unorganized docs. With 29,469 unorganized docs (mostly PeerTube transcripts), the test doc was never reached.
|
||||
|
||||
**Fix:** Added `StatusDB.get_ingest_pending(ingest_dir, limit=50)` method that filters by path (`WHERE path LIKE '/mnt/library/_ingest%'`). Updated `ingest_scan()` to use this instead.
|
||||
|
||||
**Files changed:**
|
||||
- `/opt/recon/lib/status.py` — added `get_ingest_pending()` method
|
||||
- `/opt/recon/lib/new_pipeline.py` — updated Phase B in `ingest_scan()`
|
||||
|
||||
### Bug 2: Reverse doesn't clear organized_at (FIXED)
|
||||
|
||||
**Problem:** After reversing a placement, `organized_at` remained set, preventing Phase B from re-triggering placement on the next watchdog cycle.
|
||||
|
||||
**Fix:** Added `UPDATE documents SET organized_at = NULL WHERE hash = ?` to `reverse_operation()`.
|
||||
|
||||
**Files changed:**
|
||||
- `/opt/recon/lib/new_pipeline.py` — added organized_at clear in `reverse_operation()`
|
||||
|
||||
### Non-bug: Watchdog logging
|
||||
|
||||
**Observation:** `recon.py pipeline watch` produces no stdout/stderr output because `run_watchdog()` uses `logging.getLogger('recon.pipeline')` which only has handlers configured when `setup_logging()` is called for a parent logger during service mode. Not a functional issue — logs go to `/opt/recon/logs/recon.log` in service mode.
|
||||
|
||||
---
|
||||
|
||||
## Cleanup
|
||||
|
||||
- Pipeline disabled: `new_pipeline.enabled: false`
|
||||
- Watchdog process killed
|
||||
- Test document left in place at `Civil-Organization/Governance/Civil_Governance_Framework_Analysis.pdf` (valid document, no reason to remove)
|
||||
- Local copies synced
|
||||
|
||||
---
|
||||
|
||||
## Verdict: PASS
|
||||
|
||||
All phases of the two-phase ingest pipeline work correctly:
|
||||
1. Phase A acquires files from `_acquired/` to `_ingest/` and queues for processing
|
||||
2. RECON pipeline processes queued documents normally (extract → enrich → embed)
|
||||
3. Phase B places enriched documents with standardized filenames derived from `book_title`
|
||||
4. Reverse operation correctly undoes placement (file, DB, Qdrant)
|
||||
5. Re-placement after reverse works correctly
|
||||
6. Two bugs found and fixed during testing (query efficiency + organized_at reset)
|
||||
Loading…
Add table
Add a link
Reference in a new issue