refactored-recon/phases/phase-1-scaffolding.md
Matt b697404df2 Phase 1: scaffolding (directories, config, text_dir column)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 14:48:27 +00:00

96 lines
2.9 KiB
Markdown

# Phase 1: Scaffolding
**Executed:** 2026-04-14T14:45Z UTC
---
## Backups (taken before changes)
| Item | Location | MD5 Hash |
|------|----------|----------|
| recon.db | CT 130: `/tmp/recon.db.phase1.20260414.bak` | `69d94a2c21686871c8c6863903710e3f` |
| config.yaml | CT 130: `/tmp/config.yaml.phase1.20260414.bak` | `6d70ed572dfb2e704abca3850ae33797` |
DB hash matches Phase 0 backup — no changes occurred between phases.
---
## What Changed
### 1. Filesystem: New directory tree
Created under `/opt/recon/data/`:
```
acquired/
README.md
pdf/.keep
stream/.keep
html/.keep
processing/
README.md
```
All owned by `zvx:zvx`, matching the existing data directory.
### 2. Config: Three edits to `/opt/recon/config.yaml`
**a) `new_pipeline.enabled` set to `false`**
The Stream B library pipeline (watchdog-driven file intake from `_acquired/` and `_ingest/`) is disabled. This prevents the old pipeline from processing files while we build the replacement.
**b) `crawler.sites` set to `[]`**
All 44 crawl target site definitions commented out and preserved as historical reference. The crawler scheduler will find zero sites and do nothing if started.
**c) New `pipeline:` section added at end of file**
```yaml
pipeline:
acquired_root: /opt/recon/data/acquired
processing_root: /opt/recon/data/processing
dispatch:
pdf: pdf_processor
stream: transcript_processor
html: html_processor
mtime_stability_seconds: 10
```
Scaffolding only — no code reads this section yet. Processors do not exist.
**Config diff stats:** 284 lines removed, 302 lines added (bulk is the 44 sites being commented/uncommented).
### 3. Schema: `text_dir` column added to `documents` table
```sql
ALTER TABLE documents ADD COLUMN text_dir TEXT;
```
All 29,812 existing rows have `text_dir = NULL`. This column will hold the path to each document's extracted text directory, replacing the convention-based `data/text/{hash}/` lookup.
---
## What Did Not Change
- **No code modified:** `recon.py`, `lib/`, `scripts/`, templates, static assets — all untouched
- **No data modified:** catalogue and documents row counts remain 29,812 each
- **No service state changed:** Both `recon.service` and `recon-watchdog.service` remain inactive (both still `enabled` — will auto-start on reboot)
- **No Qdrant changes:** Collection `recon_knowledge_hybrid` untouched (2,320,695 points)
- **No file moves or deletions:** Existing `data/text/`, `data/concepts/`, NFS mounts all untouched
---
## Verification (post-change)
| Check | Result |
|-------|--------|
| recon.service | inactive |
| recon-watchdog.service | inactive |
| catalogue rows | 29,812 |
| documents rows | 29,812 |
| text_dir NULL count | 29,812 (all rows) |
| new_pipeline.enabled | `false` |
| crawler.sites | `[]` |
| pipeline.acquired_root | `/opt/recon/data/acquired` |
| New directories exist | all 5 confirmed, zvx:zvx |
| YAML validates | yes |