mirror of
https://github.com/zvx-echo6/recon.git
synced 2026-05-20 06:34:40 +02:00
Transcripts are derived text from PeerTube videos, not primary source files. They do not belong in library/Domain/Subdomain/ like PDFs. Change: transcript_processor.pre_flight() now sets organized_at = CURRENT_TIMESTAMP at the end of successful processing, marking the transcript as organized in place. The watch URL remains in catalogue.path and Qdrant download_url so users clicking search results go to the PeerTube video. The filing workers path LIKE filter naturally excludes transcripts since their documents.path is the watch URL, not a filesystem path. No filing worker changes needed. Back-fills 2,260 drain items from Phase 5c-2 via one-time SQL. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| acquisition | ||
| processors | ||
| __init__.py | ||
| api.py | ||
| crawler.py | ||
| dispatcher.py | ||
| embedder.py | ||
| enricher.py | ||
| extractor.py | ||
| filing.py | ||
| ingester.py | ||
| key_manager.py | ||
| new_pipeline.py | ||
| organizer.py | ||
| peertube_collector.py | ||
| peertube_scraper.py | ||
| status.py | ||
| utils.py | ||
| web_scraper.py | ||