From 2a1d211d7cb3291db51fa0b9a6cbc12a156d0213 Mon Sep 17 00:00:00 2001 From: Matt Date: Tue, 14 Apr 2026 15:04:13 +0000 Subject: [PATCH] Phase 2: shared filing function Co-Authored-By: Claude Opus 4.6 --- phases/phase-2-shared-filing.md | 115 ++++++++++++++++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 phases/phase-2-shared-filing.md diff --git a/phases/phase-2-shared-filing.md b/phases/phase-2-shared-filing.md new file mode 100644 index 0000000..7382fb3 --- /dev/null +++ b/phases/phase-2-shared-filing.md @@ -0,0 +1,115 @@ +# Phase 2: Shared Filing Function + +**Executed:** 2026-04-14T15:15Z UTC + +--- + +## Backup + +| Item | Location | MD5 Hash | +|------|----------|----------| +| recon.db (pre-Phase 2) | CT 130: `/tmp/recon.db.phase2.20260414.bak` | `20ec1fec2247a999e7d42f6a716481b0` | + +--- + +## Git Setup (prerequisite work) + +`/opt/recon` was not a git repository. Initialized and pushed: + +- **Repo:** https://forge.echo6.co/matt/recon (private) +- **Auth:** HTTPS with API token (SSH key on CT 130 was already registered elsewhere in Forgejo) +- **Initial commit:** `563c16b` — full codebase baseline on `master` +- **Refactor branch:** `refactor` created from `master` + +--- + +## What Was Created + +### `lib/filing.py` — `file_processed_item()` function + +**RECON branch:** `refactor` +**Commit:** `de2c59a` + +A shared filing function that any future processor can call to file a completed item from the processing stage into the organized library. + +**Signature:** +```python +def file_processed_item(doc_hash, source_file_path, db, config, dry_run=False) -> dict +``` + +**Return dict keys:** `hash`, `action`, `source_path`, `target_path`, `domain`, `subdomain`, `qdrant_points_updated`, `error` + +**Action values:** `filed`, `skip_unclassified`, `skip_already_filed`, `would_file`, `error` + +**What it does (in order):** +1. Verifies source file exists +2. Calls `determine_dominant_domain()` to classify from concept JSONs +3. Looks up original filename from catalogue +4. Calls `_build_target_path()` with collision handling +5. Checks idempotency (source == target → skip_already_filed) +6. In dry_run: returns `would_file` without moving +7. Moves file with `shutil.move()` +8. Updates catalogue path, documents path, marks organized +9. Updates Qdrant payloads (download_url, filename, original_filename) + +--- + +## Dependencies on Existing Code + +| Module | Function/Method | Purpose | +|--------|----------------|---------| +| `lib/organizer.py` | `determine_dominant_domain(doc_hash, data_dir)` | Domain classification from concept JSONs | +| `lib/organizer.py` | `_build_target_path(library_root, domain, subdomain, filename, doc_hash)` | Target path with collision handling | +| `lib/new_pipeline.py` | `update_qdrant_payload(doc_hash, new_path, new_filename, original_filename, config)` | Qdrant payload sync | +| `lib/status.py` | `StatusDB.update_catalogue_path(hash, path, filename)` | Catalogue DB update | +| `lib/status.py` | `StatusDB.sync_document_path(hash, path, filename)` | Documents DB update | +| `lib/status.py` | `StatusDB.mark_organized(hash)` | Set organized_at timestamp | +| `lib/status.py` | `StatusDB._get_conn()` | Thread-local SQLite connection | + +--- + +## Testing + +### Import test +``` +python3 -c "from lib.filing import file_processed_item; print('Import OK')" +→ Import OK +``` + +### Dry-run test against real data +Document: `3c8512868fa568a861c7994019ed5e88` (U.S. Army Reconnaissance And Surveillance Handbook) + +``` +action: would_file +domain: Defense & Tactics +subdomain: Reconnaissance +target_path: /mnt/library/Defense-and-Tactics/Reconnaissance/U.S. Army Reconnaissance And Surveillance Handbook.pdf +qdrant_points_updated: 0 (dry_run — no actual update) +error: None +``` + +The function correctly classified the document, derived the canonical path, and returned `would_file` (source path uses underscores, target uses spaces — slight rename). + +--- + +## What Did NOT Change + +- **No existing files modified:** `lib/organizer.py`, `lib/status.py`, `lib/new_pipeline.py`, `lib/utils.py`, `recon.py` — all untouched +- **No data modified:** catalogue=29,812, documents=29,812 (unchanged) +- **No service state changed:** Both services remain inactive +- **Processing directory empty:** No files placed in `/opt/recon/data/processing/` +- **Legacy `organize_document()` untouched** — remains available for existing code paths + +--- + +## Verification + +| Check | Result | +|-------|--------| +| catalogue rows | 29,812 | +| documents rows | 29,812 | +| processing/ files | 0 | +| recon.service | inactive | +| recon-watchdog.service | inactive | +| Import test | passed | +| Dry-run test | passed (would_file) |