recon

matt/recon

mirror of https://github.com/zvx-echo6/recon.git synced 2026-05-20 06:34:40 +02:00

History

Matt 62539861f2 Phase 6f: text processor for .txt file ingestion New processor: lib/processors/text_processor.py Handles plain text files (.txt) as primary source documents. Pipeline: acquired/text/ -> dispatcher -> text_processor.pre_flight() -> enrich -> embed -> filing worker -> library/Domain/Subdomain/ Metadata extraction via two-source vote: - Source A: filename parsing (title from filename) - Source B: Gemini LLM extraction (title/author/edition/year from first 3 pages of text) Page splitting reuses chunk_text() from lib/web_scraper.py. Filing behavior matches PDFs (files to library, not organized in-place like transcripts). Config: adds text: text_processor to pipeline.dispatch map. New hopper subfolder: data/acquired/text/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-04-15 22:39:31 +00:00
..
acquisition	Phase 6d: PeerTube acquisition module + service thread	2026-04-15 03:08:51 +00:00
processors	Phase 6f: text processor for .txt file ingestion	2026-04-15 22:39:31 +00:00
__init__.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
api.py	Revert "Phase 6e: rewire dashboard PeerTube endpoint to acquisition module"	2026-04-15 03:20:46 +00:00
dispatcher.py	Phase 5c-1: dispatcher loop, filing worker loop, service rewire	2026-04-14 18:30:58 +00:00
embedder.py	Phase 3: dispatcher, transcript processor, text_dir resolution	2026-04-14 15:39:42 +00:00
enricher.py	Phase 3: dispatcher, transcript processor, text_dir resolution	2026-04-14 15:39:42 +00:00
extractor.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
filing.py	Phase 5c-1: dispatcher loop, filing worker loop, service rewire	2026-04-14 18:30:58 +00:00
ingester.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
key_manager.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
new_pipeline.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
organizer.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
peertube_collector.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
peertube_scraper.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
status.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
utils.py	Phase 3: dispatcher, transcript processor, text_dir resolution	2026-04-14 15:39:42 +00:00
web_scraper.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00