recon

matt/recon

Fork 0

mirror of https://github.com/zvx-echo6/recon.git synced 2026-05-20 06:34:40 +02:00

Commit graph

Author	SHA1	Message	Date
Matt	f0b160ef7c	Extract _full_zim_cleanup helper, add SIGHUP + scrape_jobs cleanup - Extract shared _full_zim_cleanup(source_id) from api_kiwix_remove - Add SIGHUP to kiwix-serve after kiwix-manage remove - Delete linked scrape_jobs rows during ZIM removal - Update api_scraper_delete to do full ZIM cleanup when applicable - Set chromium_path for single-file browser crawl support - Add status.db to .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-19 02:28:49 +00:00
Matt	7c1af0f063	Phase 1: Kiwix foundation — ZIM monitor and kiwix-serve setup - Add lib/zim_monitor.py: polls kiwix-serve OPDS v2 catalog, detects new ZIMs, reads accurate article count from python-libzim Counter metadata (not inflated OPDS count), inserts into zim_sources table. Idempotent on re-run, marks removed ZIMs. - DB schema: zim_sources, zim_samples, zim_articles tables (created via sqlite3, not in migrations — matches existing RECON pattern) - kiwix-tools 3.7.0 installed from binary tarball at /opt/recon/bin/ (Ubuntu 24.04 apt ships 3.5.0 which lacks OPDS v2) - kiwix.service systemd unit on port 8430 - python-libzim 3.9.0 installed - Test ZIM: Appropedia EN maxi (496 MB, 19,445 articles) - Add bin/ to .gitignore (binary tarball, not source)	2026-04-16 23:39:34 +00:00
Matt	563c16bb71	Initial commit: RECON codebase baseline Current state of the pipeline code as of 2026-04-14 (Phase 1 scaffolding complete). Config has new_pipeline.enabled=false and crawler.sites=[] per refactor plan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-14 14:57:23 +00:00

Author

SHA1

Message

Date

Matt

f0b160ef7c

Extract _full_zim_cleanup helper, add SIGHUP + scrape_jobs cleanup

- Extract shared _full_zim_cleanup(source_id) from api_kiwix_remove
- Add SIGHUP to kiwix-serve after kiwix-manage remove
- Delete linked scrape_jobs rows during ZIM removal
- Update api_scraper_delete to do full ZIM cleanup when applicable
- Set chromium_path for single-file browser crawl support
- Add status.db to .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-19 02:28:49 +00:00

Matt

7c1af0f063

Phase 1: Kiwix foundation — ZIM monitor and kiwix-serve setup

- Add lib/zim_monitor.py: polls kiwix-serve OPDS v2 catalog, detects
  new ZIMs, reads accurate article count from python-libzim Counter
  metadata (not inflated OPDS count), inserts into zim_sources table.
  Idempotent on re-run, marks removed ZIMs.
- DB schema: zim_sources, zim_samples, zim_articles tables (created
  via sqlite3, not in migrations — matches existing RECON pattern)
- kiwix-tools 3.7.0 installed from binary tarball at /opt/recon/bin/
  (Ubuntu 24.04 apt ships 3.5.0 which lacks OPDS v2)
- kiwix.service systemd unit on port 8430
- python-libzim 3.9.0 installed
- Test ZIM: Appropedia EN maxi (496 MB, 19,445 articles)
- Add bin/ to .gitignore (binary tarball, not source)

2026-04-16 23:39:34 +00:00

Matt

563c16bb71

Initial commit: RECON codebase baseline

Current state of the pipeline code as of 2026-04-14 (Phase 1 scaffolding complete).
Config has new_pipeline.enabled=false and crawler.sites=[] per refactor plan.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-04-14 14:57:23 +00:00

3 commits