| phases | ||
| .gitignore | ||
| architecture.md | ||
| AUTH-PUBLIC-FRONTEND.md | ||
| cleanup-log.md | ||
| CONTOUR-REBUILD.md | ||
| current-state.md | ||
| decisions.md | ||
| KIWIX-INTEGRATION-v2.md | ||
| migration-plan.md | ||
| NAV-INTEGRATION-v3.md | ||
| NAV-INTEGRATION-v4.md | ||
| NAVI-DIRECTIONS-REDESIGN.md | ||
| PROJECT-BIBLE.md | ||
| PROJECT-STATE.md | ||
| README.md | ||
refactored-recon
Design documents for the RECON pipeline refactor. The goal is to restructure RECON's ingestion pipeline into a hopper-based, type-dispatched architecture where new content sources can be added by writing a small acquisition module and a small processor module without touching shared infrastructure.
This repo is design-only. Implementation happens in the RECON repo; this repo tracks the thinking, the decisions, and the phased migration plan with git history so the architecture can evolve visibly over time.
Status
- Design drafted: 2026-04-14
- Implementation status: not started
- Current system: recon.service stopped pending refactor
Documents
- architecture.md — target architecture. The hopper model, processor pattern, lifecycle, contracts.
- current-state.md — where the system is today, what works, what's broken, what's technical debt.
- migration-plan.md — phased plan to get from current to target without losing data or extended downtime.
- decisions.md — architectural decision record. The forks we considered and why we chose what we chose.
- phases/ — detailed per-phase execution plans (to be filled in as each phase is scoped).
Read order
If you're new to this design, read in this order:
current-state.md— understand what existsarchitecture.md— understand the targetdecisions.md— understand why the target looks the way it doesmigration-plan.md— understand how we get there
Principles
Three principles shaped every decision in this design. When in doubt on a detail, fall back to these:
Modularity on the edges, uniformity in the middle. Each content source (PDFs, transcripts, HTML, future types) is its own acquisition module and its own processor. They share nothing except the enrich/embed infrastructure and the filesystem contract. Adding a new type touches only the two new modules and one line of config.
State is a directory. A file's location on disk tells you what stage of the pipeline it's in. Acquired but unprocessed → sitting in _acquired/. Being worked on → sitting in _processing/. Done → sitting in the library under its final name. No status tracking that isn't reflected in where the file actually lives.
Small atomic transitions. Files move between stages as complete units with all their metadata updated together — filesystem, catalogue, documents table, and Qdrant payloads in one transition. Partial state is the enemy. If any part of a transition fails, the file stays where it was.