Commit graph

53 commits

Author SHA1 Message Date
66fadb7487 Phase 3: dispatcher, transcript processor, text_dir resolution
- lib/dispatcher.py: one-shot dispatcher that scans acquired/<type>/
  for content+sidecar pairs and routes to registered processors
- lib/processors/transcript_processor.py: pre_flight() for transcripts
  (hash, dedupe, split into pages, register in DB, set text_dir)
- lib/utils.py: resolve_text_dir() helper for text_dir column fallback
- lib/enricher.py: use resolve_text_dir() instead of hardcoded path
- lib/embedder.py: use resolve_text_dir() instead of hardcoded path
- lib/processors/__init__.py, lib/acquisition/__init__.py: package inits

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 15:39:42 +00:00
de2c59a501 Phase 2: add shared filing function (lib/filing.py)
New reusable file_processed_item() that future processors will call to file
completed items from /opt/recon/data/processing/{hash}/ into the library.

Reuses existing organizer logic for domain classification and collision handling.
Not yet wired into the service loop — exists as library code for Phase 3+ to call.

Phase 2 of the refactor. See https://forge.echo6.co/matt/refactored-recon

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 15:03:36 +00:00
563c16bb71 Initial commit: RECON codebase baseline
Current state of the pipeline code as of 2026-04-14 (Phase 1 scaffolding complete).
Config has new_pipeline.enabled=false and crawler.sites=[] per refactor plan.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 14:57:23 +00:00