Parse "crawled":N from Browsertrix crawlStatus JSON logs instead of
looking for "N pages" pattern. Also check stdout (not just stderr).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
warc2zim (called internally by zimit) requires --name for ZIM metadata.
Without it, argument validation fails with exit code 2.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Must pass `zimit` as command after image name (entrypoint execs args)
- --url → --seeds, --name removed, --lang → --zim-lang, --workers → -w
- Remove --rm so docker logs work after exit, manually rm container
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Format: {domain}_{lang}_{YYYY-MM}_{job_id}.zim
Prevents zimwriterfs failures when the same domain is scraped
multiple times in the same month.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SingleFile CLI has no --crawl-delay option. The invalid flag caused the
process to print help and exit with no output. Added --crawl-no-parent
and --crawl-replace-URLs instead. Removed unused crawl_delay config key.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>