recon/lib
Matt 8945c82e3f Replace wget/SingleFile/Playwright backends with Zimit
- Zimit Docker container handles all site types (static, SPA, JS redirects)
- Removed: _detect_crawl_mode, _crawl_wget, _crawl_singlefile, preflight logic
- Added: _crawl_zimit() with Docker lifecycle management
- Simplified pipeline: submit → Zimit crawl → kiwix-manage register → done
- No more zimwriterfs step — Zimit produces ZIM directly
- Dashboard UI simplified: removed crawl mode dropdown
- Config simplified: removed reject patterns, preflight, singlefile sections

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-19 14:06:23 +00:00
..
acquisition Phase 6d: PeerTube acquisition module + service thread 2026-04-15 03:08:51 +00:00
processors Filter non-English articles from ZIM ingestion 2026-04-17 07:30:30 +00:00
__init__.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
api.py Replace wget/SingleFile/Playwright backends with Zimit 2026-04-19 14:06:23 +00:00
dispatcher.py Phase 6f-2: format normalizer in dispatcher 2026-04-15 23:08:19 +00:00
embedder.py Fix Kiwix download URL generation in embedder 2026-04-18 00:06:52 +00:00
enricher.py Add langdetect language filter to enricher + purge non-English ZIM articles 2026-04-17 14:37:13 +00:00
extractor.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
filing.py Phase 5c-1: dispatcher loop, filing worker loop, service rewire 2026-04-14 18:30:58 +00:00
ingester.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
key_manager.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
new_pipeline.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
organizer.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
peertube_collector.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
peertube_scraper.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
scraper_runner.py Replace wget/SingleFile/Playwright backends with Zimit 2026-04-19 14:06:23 +00:00
status.py Add scraper Phase 2: smart crawl mode detection + browser fallback 2026-04-18 18:26:43 +00:00
utils.py Phase 3: dispatcher, transcript processor, text_dir resolution 2026-04-14 15:39:42 +00:00
web_scraper.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
zim_monitor.py Phase 1: Kiwix foundation — ZIM monitor and kiwix-serve setup 2026-04-16 23:39:34 +00:00