Phase 4: Phase 3 cleanup fixes

Fix 1.1: filing preserves source file extension instead of defaulting to .pdf
Fix 1.2: back-fixed soldering transcript from .pdf to .txt (hash 380dbc78)
Fix 1.3: dispatcher logs missing processor modules at DEBUG, not ERROR
Fix 1.4: transcript processor cleans stale processing/concepts dirs on entry
Also: dispatcher now handles solo content files without .meta.json sidecar

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Matt 2026-04-14 16:39:57 +00:00
commit 9fe6a0a782
3 changed files with 56 additions and 17 deletions

View file

@ -95,6 +95,15 @@ def file_processed_item(doc_hash, source_file_path, db, config, dry_run=False):
result["action"] = "skip_unclassified"
return result
# Fix 1.1: Preserve the source file's actual extension instead of
# the default .pdf that sanitize_filename() may have applied
source_ext = os.path.splitext(source_file_path)[1].lower()
if source_ext:
target_stem, _old_ext = os.path.splitext(target_path)
target_path = target_stem + source_ext
san_stem, _old_ext = os.path.splitext(sanitized_name)
sanitized_name = san_stem + source_ext
result["target_path"] = target_path
# If already at target (idempotency), just mark organized