mirror of
https://github.com/zvx-echo6/refactored-recon.git
synced 2026-05-20 14:44:39 +02:00
4 KiB
4 KiB
Phase 2: Shared Filing Function
Executed: 2026-04-14T15:15Z UTC
Backup
| Item | Location | MD5 Hash |
|---|---|---|
| recon.db (pre-Phase 2) | CT 130: /tmp/recon.db.phase2.20260414.bak |
20ec1fec2247a999e7d42f6a716481b0 |
Git Setup (prerequisite work)
/opt/recon was not a git repository. Initialized and pushed:
- Repo: https://forge.echo6.co/matt/recon (private)
- Auth: HTTPS with API token (SSH key on CT 130 was already registered elsewhere in Forgejo)
- Initial commit:
563c16b— full codebase baseline onmaster - Refactor branch:
refactorcreated frommaster
What Was Created
lib/filing.py — file_processed_item() function
RECON branch: refactor
Commit: de2c59a
A shared filing function that any future processor can call to file a completed item from the processing stage into the organized library.
Signature:
def file_processed_item(doc_hash, source_file_path, db, config, dry_run=False) -> dict
Return dict keys: hash, action, source_path, target_path, domain, subdomain, qdrant_points_updated, error
Action values: filed, skip_unclassified, skip_already_filed, would_file, error
What it does (in order):
- Verifies source file exists
- Calls
determine_dominant_domain()to classify from concept JSONs - Looks up original filename from catalogue
- Calls
_build_target_path()with collision handling - Checks idempotency (source == target → skip_already_filed)
- In dry_run: returns
would_filewithout moving - Moves file with
shutil.move() - Updates catalogue path, documents path, marks organized
- Updates Qdrant payloads (download_url, filename, original_filename)
Dependencies on Existing Code
| Module | Function/Method | Purpose |
|---|---|---|
lib/organizer.py |
determine_dominant_domain(doc_hash, data_dir) |
Domain classification from concept JSONs |
lib/organizer.py |
_build_target_path(library_root, domain, subdomain, filename, doc_hash) |
Target path with collision handling |
lib/new_pipeline.py |
update_qdrant_payload(doc_hash, new_path, new_filename, original_filename, config) |
Qdrant payload sync |
lib/status.py |
StatusDB.update_catalogue_path(hash, path, filename) |
Catalogue DB update |
lib/status.py |
StatusDB.sync_document_path(hash, path, filename) |
Documents DB update |
lib/status.py |
StatusDB.mark_organized(hash) |
Set organized_at timestamp |
lib/status.py |
StatusDB._get_conn() |
Thread-local SQLite connection |
Testing
Import test
python3 -c "from lib.filing import file_processed_item; print('Import OK')"
→ Import OK
Dry-run test against real data
Document: 3c8512868fa568a861c7994019ed5e88 (U.S. Army Reconnaissance And Surveillance Handbook)
action: would_file
domain: Defense & Tactics
subdomain: Reconnaissance
target_path: /mnt/library/Defense-and-Tactics/Reconnaissance/U.S. Army Reconnaissance And Surveillance Handbook.pdf
qdrant_points_updated: 0 (dry_run — no actual update)
error: None
The function correctly classified the document, derived the canonical path, and returned would_file (source path uses underscores, target uses spaces — slight rename).
What Did NOT Change
- No existing files modified:
lib/organizer.py,lib/status.py,lib/new_pipeline.py,lib/utils.py,recon.py— all untouched - No data modified: catalogue=29,812, documents=29,812 (unchanged)
- No service state changed: Both services remain inactive
- Processing directory empty: No files placed in
/opt/recon/data/processing/ - Legacy
organize_document()untouched — remains available for existing code paths
Verification
| Check | Result |
|---|---|
| catalogue rows | 29,812 |
| documents rows | 29,812 |
| processing/ files | 0 |
| recon.service | inactive |
| recon-watchdog.service | inactive |
| Import test | passed |
| Dry-run test | passed (would_file) |