# refactored-recon Design documents for the RECON pipeline refactor. The goal is to restructure RECON's ingestion pipeline into a hopper-based, type-dispatched architecture where new content sources can be added by writing a small acquisition module and a small processor module without touching shared infrastructure. This repo is design-only. Implementation happens in the RECON repo; this repo tracks the thinking, the decisions, and the phased migration plan with git history so the architecture can evolve visibly over time. ## Status - Design drafted: 2026-04-14 - Implementation status: not started - Current system: recon.service stopped pending refactor ## Documents - [architecture.md](architecture.md) — target architecture. The hopper model, processor pattern, lifecycle, contracts. - [current-state.md](current-state.md) — where the system is today, what works, what's broken, what's technical debt. - [migration-plan.md](migration-plan.md) — phased plan to get from current to target without losing data or extended downtime. - [decisions.md](decisions.md) — architectural decision record. The forks we considered and why we chose what we chose. - [phases/](phases/) — detailed per-phase execution plans (to be filled in as each phase is scoped). ## Read order If you're new to this design, read in this order: 1. `current-state.md` — understand what exists 2. `architecture.md` — understand the target 3. `decisions.md` — understand why the target looks the way it does 4. `migration-plan.md` — understand how we get there ## Principles Three principles shaped every decision in this design. When in doubt on a detail, fall back to these: **Modularity on the edges, uniformity in the middle.** Each content source (PDFs, transcripts, HTML, future types) is its own acquisition module and its own processor. They share nothing except the enrich/embed infrastructure and the filesystem contract. Adding a new type touches only the two new modules and one line of config. **State is a directory.** A file's location on disk tells you what stage of the pipeline it's in. Acquired but unprocessed → sitting in `_acquired/`. Being worked on → sitting in `_processing/`. Done → sitting in the library under its final name. No status tracking that isn't reflected in where the file actually lives. **Small atomic transitions.** Files move between stages as complete units with all their metadata updated together — filesystem, catalogue, documents table, and Qdrant payloads in one transition. Partial state is the enemy. If any part of a transition fails, the file stays where it was.