Commit graph

2 commits

Author SHA1 Message Date
9fe6a0a782 Phase 4: Phase 3 cleanup fixes
Fix 1.1: filing preserves source file extension instead of defaulting to .pdf
Fix 1.2: back-fixed soldering transcript from .pdf to .txt (hash 380dbc78)
Fix 1.3: dispatcher logs missing processor modules at DEBUG, not ERROR
Fix 1.4: transcript processor cleans stale processing/concepts dirs on entry
Also: dispatcher now handles solo content files without .meta.json sidecar

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 16:39:57 +00:00
66fadb7487 Phase 3: dispatcher, transcript processor, text_dir resolution
- lib/dispatcher.py: one-shot dispatcher that scans acquired/<type>/
  for content+sidecar pairs and routes to registered processors
- lib/processors/transcript_processor.py: pre_flight() for transcripts
  (hash, dedupe, split into pages, register in DB, set text_dir)
- lib/utils.py: resolve_text_dir() helper for text_dir column fallback
- lib/enricher.py: use resolve_text_dir() instead of hardcoded path
- lib/embedder.py: use resolve_text_dir() instead of hardcoded path
- lib/processors/__init__.py, lib/acquisition/__init__.py: package inits

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 15:39:42 +00:00