New lib/acquisition/peertube.py replaces the removed peertube_scanner_loop.
Polls PeerTube API every 30min, dedupes against catalogue (UUID + title),
writes flat file pairs to data/acquired/stream/ for the dispatcher.
- acquire_batch(): one-shot find-and-acquire with rate limiting
- acquisition_loop(): service thread wrapper (interval from config)
- list_new_videos(): dedup via _build_known_sets() against catalogue
- acquire_one(): fetch VTT, convert, write .tmp then rename atomically
cmd_service(): added peertube-acq daemon thread
cmd_ingest_peertube(): rewired to use acquire_batch(), drops --channel/
--since/--enrich/--process (dispatcher handles full pipeline)
config.yaml: added peertube.poll_interval: 1800
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- lib/dispatcher.py: one-shot dispatcher that scans acquired/<type>/
for content+sidecar pairs and routes to registered processors
- lib/processors/transcript_processor.py: pre_flight() for transcripts
(hash, dedupe, split into pages, register in DB, set text_dir)
- lib/utils.py: resolve_text_dir() helper for text_dir column fallback
- lib/enricher.py: use resolve_text_dir() instead of hardcoded path
- lib/embedder.py: use resolve_text_dir() instead of hardcoded path
- lib/processors/__init__.py, lib/acquisition/__init__.py: package inits
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>