mirror of
https://github.com/zvx-echo6/refactored-recon.git
synced 2026-05-20 14:44:39 +02:00
Initial design docs for RECON pipeline refactor
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
commit
aa195825e3
7 changed files with 814 additions and 0 deletions
38
README.md
Normal file
38
README.md
Normal file
|
|
@ -0,0 +1,38 @@
|
|||
# refactored-recon
|
||||
|
||||
Design documents for the RECON pipeline refactor. The goal is to restructure RECON's ingestion pipeline into a hopper-based, type-dispatched architecture where new content sources can be added by writing a small acquisition module and a small processor module without touching shared infrastructure.
|
||||
|
||||
This repo is design-only. Implementation happens in the RECON repo; this repo tracks the thinking, the decisions, and the phased migration plan with git history so the architecture can evolve visibly over time.
|
||||
|
||||
## Status
|
||||
|
||||
- Design drafted: 2026-04-14
|
||||
- Implementation status: not started
|
||||
- Current system: recon.service stopped pending refactor
|
||||
|
||||
## Documents
|
||||
|
||||
- [architecture.md](architecture.md) — target architecture. The hopper model, processor pattern, lifecycle, contracts.
|
||||
- [current-state.md](current-state.md) — where the system is today, what works, what's broken, what's technical debt.
|
||||
- [migration-plan.md](migration-plan.md) — phased plan to get from current to target without losing data or extended downtime.
|
||||
- [decisions.md](decisions.md) — architectural decision record. The forks we considered and why we chose what we chose.
|
||||
- [phases/](phases/) — detailed per-phase execution plans (to be filled in as each phase is scoped).
|
||||
|
||||
## Read order
|
||||
|
||||
If you're new to this design, read in this order:
|
||||
|
||||
1. `current-state.md` — understand what exists
|
||||
2. `architecture.md` — understand the target
|
||||
3. `decisions.md` — understand why the target looks the way it does
|
||||
4. `migration-plan.md` — understand how we get there
|
||||
|
||||
## Principles
|
||||
|
||||
Three principles shaped every decision in this design. When in doubt on a detail, fall back to these:
|
||||
|
||||
**Modularity on the edges, uniformity in the middle.** Each content source (PDFs, transcripts, HTML, future types) is its own acquisition module and its own processor. They share nothing except the enrich/embed infrastructure and the filesystem contract. Adding a new type touches only the two new modules and one line of config.
|
||||
|
||||
**State is a directory.** A file's location on disk tells you what stage of the pipeline it's in. Acquired but unprocessed → sitting in `_acquired/`. Being worked on → sitting in `_processing/`. Done → sitting in the library under its final name. No status tracking that isn't reflected in where the file actually lives.
|
||||
|
||||
**Small atomic transitions.** Files move between stages as complete units with all their metadata updated together — filesystem, catalogue, documents table, and Qdrant payloads in one transition. Partial state is the enemy. If any part of a transition fails, the file stays where it was.
|
||||
Loading…
Add table
Add a link
Reference in a new issue