mirror of
https://github.com/zvx-echo6/refactored-recon.git
synced 2026-05-20 14:44:39 +02:00
Phase 2: shared filing function
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
b697404df2
commit
2a1d211d7c
1 changed files with 115 additions and 0 deletions
115
phases/phase-2-shared-filing.md
Normal file
115
phases/phase-2-shared-filing.md
Normal file
|
|
@ -0,0 +1,115 @@
|
|||
# Phase 2: Shared Filing Function
|
||||
|
||||
**Executed:** 2026-04-14T15:15Z UTC
|
||||
|
||||
---
|
||||
|
||||
## Backup
|
||||
|
||||
| Item | Location | MD5 Hash |
|
||||
|------|----------|----------|
|
||||
| recon.db (pre-Phase 2) | CT 130: `/tmp/recon.db.phase2.20260414.bak` | `20ec1fec2247a999e7d42f6a716481b0` |
|
||||
|
||||
---
|
||||
|
||||
## Git Setup (prerequisite work)
|
||||
|
||||
`/opt/recon` was not a git repository. Initialized and pushed:
|
||||
|
||||
- **Repo:** https://forge.echo6.co/matt/recon (private)
|
||||
- **Auth:** HTTPS with API token (SSH key on CT 130 was already registered elsewhere in Forgejo)
|
||||
- **Initial commit:** `563c16b` — full codebase baseline on `master`
|
||||
- **Refactor branch:** `refactor` created from `master`
|
||||
|
||||
---
|
||||
|
||||
## What Was Created
|
||||
|
||||
### `lib/filing.py` — `file_processed_item()` function
|
||||
|
||||
**RECON branch:** `refactor`
|
||||
**Commit:** `de2c59a`
|
||||
|
||||
A shared filing function that any future processor can call to file a completed item from the processing stage into the organized library.
|
||||
|
||||
**Signature:**
|
||||
```python
|
||||
def file_processed_item(doc_hash, source_file_path, db, config, dry_run=False) -> dict
|
||||
```
|
||||
|
||||
**Return dict keys:** `hash`, `action`, `source_path`, `target_path`, `domain`, `subdomain`, `qdrant_points_updated`, `error`
|
||||
|
||||
**Action values:** `filed`, `skip_unclassified`, `skip_already_filed`, `would_file`, `error`
|
||||
|
||||
**What it does (in order):**
|
||||
1. Verifies source file exists
|
||||
2. Calls `determine_dominant_domain()` to classify from concept JSONs
|
||||
3. Looks up original filename from catalogue
|
||||
4. Calls `_build_target_path()` with collision handling
|
||||
5. Checks idempotency (source == target → skip_already_filed)
|
||||
6. In dry_run: returns `would_file` without moving
|
||||
7. Moves file with `shutil.move()`
|
||||
8. Updates catalogue path, documents path, marks organized
|
||||
9. Updates Qdrant payloads (download_url, filename, original_filename)
|
||||
|
||||
---
|
||||
|
||||
## Dependencies on Existing Code
|
||||
|
||||
| Module | Function/Method | Purpose |
|
||||
|--------|----------------|---------|
|
||||
| `lib/organizer.py` | `determine_dominant_domain(doc_hash, data_dir)` | Domain classification from concept JSONs |
|
||||
| `lib/organizer.py` | `_build_target_path(library_root, domain, subdomain, filename, doc_hash)` | Target path with collision handling |
|
||||
| `lib/new_pipeline.py` | `update_qdrant_payload(doc_hash, new_path, new_filename, original_filename, config)` | Qdrant payload sync |
|
||||
| `lib/status.py` | `StatusDB.update_catalogue_path(hash, path, filename)` | Catalogue DB update |
|
||||
| `lib/status.py` | `StatusDB.sync_document_path(hash, path, filename)` | Documents DB update |
|
||||
| `lib/status.py` | `StatusDB.mark_organized(hash)` | Set organized_at timestamp |
|
||||
| `lib/status.py` | `StatusDB._get_conn()` | Thread-local SQLite connection |
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Import test
|
||||
```
|
||||
python3 -c "from lib.filing import file_processed_item; print('Import OK')"
|
||||
→ Import OK
|
||||
```
|
||||
|
||||
### Dry-run test against real data
|
||||
Document: `3c8512868fa568a861c7994019ed5e88` (U.S. Army Reconnaissance And Surveillance Handbook)
|
||||
|
||||
```
|
||||
action: would_file
|
||||
domain: Defense & Tactics
|
||||
subdomain: Reconnaissance
|
||||
target_path: /mnt/library/Defense-and-Tactics/Reconnaissance/U.S. Army Reconnaissance And Surveillance Handbook.pdf
|
||||
qdrant_points_updated: 0 (dry_run — no actual update)
|
||||
error: None
|
||||
```
|
||||
|
||||
The function correctly classified the document, derived the canonical path, and returned `would_file` (source path uses underscores, target uses spaces — slight rename).
|
||||
|
||||
---
|
||||
|
||||
## What Did NOT Change
|
||||
|
||||
- **No existing files modified:** `lib/organizer.py`, `lib/status.py`, `lib/new_pipeline.py`, `lib/utils.py`, `recon.py` — all untouched
|
||||
- **No data modified:** catalogue=29,812, documents=29,812 (unchanged)
|
||||
- **No service state changed:** Both services remain inactive
|
||||
- **Processing directory empty:** No files placed in `/opt/recon/data/processing/`
|
||||
- **Legacy `organize_document()` untouched** — remains available for existing code paths
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
| Check | Result |
|
||||
|-------|--------|
|
||||
| catalogue rows | 29,812 |
|
||||
| documents rows | 29,812 |
|
||||
| processing/ files | 0 |
|
||||
| recon.service | inactive |
|
||||
| recon-watchdog.service | inactive |
|
||||
| Import test | passed |
|
||||
| Dry-run test | passed (would_file) |
|
||||
Loading…
Add table
Add a link
Reference in a new issue