recon/lib
Matt da50e5f0b8 Add scraper Phase 2: smart crawl mode detection + browser fallback
- Pre-flight detection: wget + Playwright probe to auto-detect if site
  needs browser rendering (JS apps, parking page redirects)
- SingleFile CLI crawl backend for JS-rendered sites
- crawl_mode column in scrape_jobs (static/browser/redirect/auto)
- API: optional crawl_mode param on submit, cleared on retry
- Config: rate_limit_delay 2.0→0.5, /api/ reject pattern, preflight
  + singlefile config sections
- Prerequisites: Node.js 22, single-file-cli, Playwright + Chromium

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-18 18:26:43 +00:00
..
acquisition Phase 6d: PeerTube acquisition module + service thread 2026-04-15 03:08:51 +00:00
processors Filter non-English articles from ZIM ingestion 2026-04-17 07:30:30 +00:00
__init__.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
api.py Add scraper Phase 2: smart crawl mode detection + browser fallback 2026-04-18 18:26:43 +00:00
dispatcher.py Phase 6f-2: format normalizer in dispatcher 2026-04-15 23:08:19 +00:00
embedder.py Fix Kiwix download URL generation in embedder 2026-04-18 00:06:52 +00:00
enricher.py Add langdetect language filter to enricher + purge non-English ZIM articles 2026-04-17 14:37:13 +00:00
extractor.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
filing.py Phase 5c-1: dispatcher loop, filing worker loop, service rewire 2026-04-14 18:30:58 +00:00
ingester.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
key_manager.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
new_pipeline.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
organizer.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
peertube_collector.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
peertube_scraper.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
scraper_runner.py Add scraper Phase 2: smart crawl mode detection + browser fallback 2026-04-18 18:26:43 +00:00
status.py Add scraper Phase 2: smart crawl mode detection + browser fallback 2026-04-18 18:26:43 +00:00
utils.py Phase 3: dispatcher, transcript processor, text_dir resolution 2026-04-14 15:39:42 +00:00
web_scraper.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
zim_monitor.py Phase 1: Kiwix foundation — ZIM monitor and kiwix-serve setup 2026-04-16 23:39:34 +00:00