mirror of
https://github.com/zvx-echo6/refactored-recon.git
synced 2026-05-20 14:44:39 +02:00
38 lines
2.6 KiB
Markdown
38 lines
2.6 KiB
Markdown
# refactored-recon
|
|
|
|
Design documents for the RECON pipeline refactor. The goal is to restructure RECON's ingestion pipeline into a hopper-based, type-dispatched architecture where new content sources can be added by writing a small acquisition module and a small processor module without touching shared infrastructure.
|
|
|
|
This repo is design-only. Implementation happens in the RECON repo; this repo tracks the thinking, the decisions, and the phased migration plan with git history so the architecture can evolve visibly over time.
|
|
|
|
## Status
|
|
|
|
- Design drafted: 2026-04-14
|
|
- Implementation status: not started
|
|
- Current system: recon.service stopped pending refactor
|
|
|
|
## Documents
|
|
|
|
- [architecture.md](architecture.md) — target architecture. The hopper model, processor pattern, lifecycle, contracts.
|
|
- [current-state.md](current-state.md) — where the system is today, what works, what's broken, what's technical debt.
|
|
- [migration-plan.md](migration-plan.md) — phased plan to get from current to target without losing data or extended downtime.
|
|
- [decisions.md](decisions.md) — architectural decision record. The forks we considered and why we chose what we chose.
|
|
- [phases/](phases/) — detailed per-phase execution plans (to be filled in as each phase is scoped).
|
|
|
|
## Read order
|
|
|
|
If you're new to this design, read in this order:
|
|
|
|
1. `current-state.md` — understand what exists
|
|
2. `architecture.md` — understand the target
|
|
3. `decisions.md` — understand why the target looks the way it does
|
|
4. `migration-plan.md` — understand how we get there
|
|
|
|
## Principles
|
|
|
|
Three principles shaped every decision in this design. When in doubt on a detail, fall back to these:
|
|
|
|
**Modularity on the edges, uniformity in the middle.** Each content source (PDFs, transcripts, HTML, future types) is its own acquisition module and its own processor. They share nothing except the enrich/embed infrastructure and the filesystem contract. Adding a new type touches only the two new modules and one line of config.
|
|
|
|
**State is a directory.** A file's location on disk tells you what stage of the pipeline it's in. Acquired but unprocessed → sitting in `_acquired/`. Being worked on → sitting in `_processing/`. Done → sitting in the library under its final name. No status tracking that isn't reflected in where the file actually lives.
|
|
|
|
**Small atomic transitions.** Files move between stages as complete units with all their metadata updated together — filesystem, catalogue, documents table, and Qdrant payloads in one transition. Partial state is the enemy. If any part of a transition fails, the file stays where it was.
|