Complete sources now show "19,344 in Qdrant" instead of misleading
extraction counts. Each status gets contextual progress display:
complete → X in Qdrant, processing → X/Y in Qdrant (%),
extracting → X/Y extracted, detected → dash.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Status was showing COMPLETE after ZIM extraction finished, even when
documents were still queued for enrichment/embedding. Now computes
effective_status by checking actual pipeline state per-source:
- DETECTED: ingest not enabled (gray)
- EXTRACTING: ZIM processor running (blue)
- PROCESSING: extracted but docs still in enricher/embedder queue (amber)
- COMPLETE: all docs fully enriched and embedded in Qdrant (green)
Also fixed _build_kiwix_sources pipeline query to filter by category
per-source instead of returning global kiwix stats for every source.
Progress column now shows "X / Y in Qdrant" when processing, or
"X / Y extracted" otherwise.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ZIM processor: extract articles from ZIM files, feed into existing enrichment pipeline
- Dashboard: Kiwix tab with library table, ingest toggle, upload, remove
- kiwix-serve on port 8430, wiki.echo6.co behind Authentik
- Citation URLs point to wiki.echo6.co/{zimname}/{article_path}
- Dashboard shows WIKI type badge for ZIM-sourced content
- Appropedia EN (19,445 articles) fully ingested as proof of concept
Two bugs in the Recently Completed table:
1. Title showed "Untitled" for all transcripts because the dashboard
read documents.book_title (populated by PDF metadata voting) which
is NULL for transcripts. Fixed by COALESCE(book_title, filename)
in the SQL query -- falls back to catalogue.filename which holds
the real video title.
2. Type showed "WEB" for all transcripts because the type CASE
expression only had web and pdf branches, with web matching any
http% path -- and transcript paths are PeerTube watch URLs.
Fixed by adding a transcript branch keyed on catalogue.source =
stream.echo6.co, evaluated before the web branch.
Also adds badge-transcript CSS (purple) and JS rendering case.
Applied consistently to both the Recently Completed and Sources
table queries.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Current state of the pipeline code as of 2026-04-14 (Phase 1 scaffolding complete).
Config has new_pipeline.enabled=false and crawler.sites=[] per refactor plan.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>