mirror of
https://github.com/zvx-echo6/refactored-recon.git
synced 2026-05-20 14:44:39 +02:00
115 lines
4 KiB
Markdown
115 lines
4 KiB
Markdown
|
|
# Phase 2: Shared Filing Function
|
||
|
|
|
||
|
|
**Executed:** 2026-04-14T15:15Z UTC
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Backup
|
||
|
|
|
||
|
|
| Item | Location | MD5 Hash |
|
||
|
|
|------|----------|----------|
|
||
|
|
| recon.db (pre-Phase 2) | CT 130: `/tmp/recon.db.phase2.20260414.bak` | `20ec1fec2247a999e7d42f6a716481b0` |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Git Setup (prerequisite work)
|
||
|
|
|
||
|
|
`/opt/recon` was not a git repository. Initialized and pushed:
|
||
|
|
|
||
|
|
- **Repo:** https://forge.echo6.co/matt/recon (private)
|
||
|
|
- **Auth:** HTTPS with API token (SSH key on CT 130 was already registered elsewhere in Forgejo)
|
||
|
|
- **Initial commit:** `563c16b` — full codebase baseline on `master`
|
||
|
|
- **Refactor branch:** `refactor` created from `master`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## What Was Created
|
||
|
|
|
||
|
|
### `lib/filing.py` — `file_processed_item()` function
|
||
|
|
|
||
|
|
**RECON branch:** `refactor`
|
||
|
|
**Commit:** `de2c59a`
|
||
|
|
|
||
|
|
A shared filing function that any future processor can call to file a completed item from the processing stage into the organized library.
|
||
|
|
|
||
|
|
**Signature:**
|
||
|
|
```python
|
||
|
|
def file_processed_item(doc_hash, source_file_path, db, config, dry_run=False) -> dict
|
||
|
|
```
|
||
|
|
|
||
|
|
**Return dict keys:** `hash`, `action`, `source_path`, `target_path`, `domain`, `subdomain`, `qdrant_points_updated`, `error`
|
||
|
|
|
||
|
|
**Action values:** `filed`, `skip_unclassified`, `skip_already_filed`, `would_file`, `error`
|
||
|
|
|
||
|
|
**What it does (in order):**
|
||
|
|
1. Verifies source file exists
|
||
|
|
2. Calls `determine_dominant_domain()` to classify from concept JSONs
|
||
|
|
3. Looks up original filename from catalogue
|
||
|
|
4. Calls `_build_target_path()` with collision handling
|
||
|
|
5. Checks idempotency (source == target → skip_already_filed)
|
||
|
|
6. In dry_run: returns `would_file` without moving
|
||
|
|
7. Moves file with `shutil.move()`
|
||
|
|
8. Updates catalogue path, documents path, marks organized
|
||
|
|
9. Updates Qdrant payloads (download_url, filename, original_filename)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Dependencies on Existing Code
|
||
|
|
|
||
|
|
| Module | Function/Method | Purpose |
|
||
|
|
|--------|----------------|---------|
|
||
|
|
| `lib/organizer.py` | `determine_dominant_domain(doc_hash, data_dir)` | Domain classification from concept JSONs |
|
||
|
|
| `lib/organizer.py` | `_build_target_path(library_root, domain, subdomain, filename, doc_hash)` | Target path with collision handling |
|
||
|
|
| `lib/new_pipeline.py` | `update_qdrant_payload(doc_hash, new_path, new_filename, original_filename, config)` | Qdrant payload sync |
|
||
|
|
| `lib/status.py` | `StatusDB.update_catalogue_path(hash, path, filename)` | Catalogue DB update |
|
||
|
|
| `lib/status.py` | `StatusDB.sync_document_path(hash, path, filename)` | Documents DB update |
|
||
|
|
| `lib/status.py` | `StatusDB.mark_organized(hash)` | Set organized_at timestamp |
|
||
|
|
| `lib/status.py` | `StatusDB._get_conn()` | Thread-local SQLite connection |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Testing
|
||
|
|
|
||
|
|
### Import test
|
||
|
|
```
|
||
|
|
python3 -c "from lib.filing import file_processed_item; print('Import OK')"
|
||
|
|
→ Import OK
|
||
|
|
```
|
||
|
|
|
||
|
|
### Dry-run test against real data
|
||
|
|
Document: `3c8512868fa568a861c7994019ed5e88` (U.S. Army Reconnaissance And Surveillance Handbook)
|
||
|
|
|
||
|
|
```
|
||
|
|
action: would_file
|
||
|
|
domain: Defense & Tactics
|
||
|
|
subdomain: Reconnaissance
|
||
|
|
target_path: /mnt/library/Defense-and-Tactics/Reconnaissance/U.S. Army Reconnaissance And Surveillance Handbook.pdf
|
||
|
|
qdrant_points_updated: 0 (dry_run — no actual update)
|
||
|
|
error: None
|
||
|
|
```
|
||
|
|
|
||
|
|
The function correctly classified the document, derived the canonical path, and returned `would_file` (source path uses underscores, target uses spaces — slight rename).
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## What Did NOT Change
|
||
|
|
|
||
|
|
- **No existing files modified:** `lib/organizer.py`, `lib/status.py`, `lib/new_pipeline.py`, `lib/utils.py`, `recon.py` — all untouched
|
||
|
|
- **No data modified:** catalogue=29,812, documents=29,812 (unchanged)
|
||
|
|
- **No service state changed:** Both services remain inactive
|
||
|
|
- **Processing directory empty:** No files placed in `/opt/recon/data/processing/`
|
||
|
|
- **Legacy `organize_document()` untouched** — remains available for existing code paths
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Verification
|
||
|
|
|
||
|
|
| Check | Result |
|
||
|
|
|-------|--------|
|
||
|
|
| catalogue rows | 29,812 |
|
||
|
|
| documents rows | 29,812 |
|
||
|
|
| processing/ files | 0 |
|
||
|
|
| recon.service | inactive |
|
||
|
|
| recon-watchdog.service | inactive |
|
||
|
|
| Import test | passed |
|
||
|
|
| Dry-run test | passed (would_file) |
|