refactored-recon/phases/phase-1-scaffolding.md

# Phase 1: Scaffolding

**Executed:** 2026-04-14T14:45Z UTC

---

## Backups (taken before changes)

| Item | Location | MD5 Hash |
|------|----------|----------|
| recon.db | CT 130: `/tmp/recon.db.phase1.20260414.bak` | `69d94a2c21686871c8c6863903710e3f` |
| config.yaml | CT 130: `/tmp/config.yaml.phase1.20260414.bak` | `6d70ed572dfb2e704abca3850ae33797` |

DB hash matches Phase 0 backup — no changes occurred between phases.

---

## What Changed

### 1. Filesystem: New directory tree

Created under `/opt/recon/data/`:

```
acquired/
  README.md
  pdf/.keep
  stream/.keep
  html/.keep
processing/
  README.md
```

All owned by `zvx:zvx`, matching the existing data directory.

### 2. Config: Three edits to `/opt/recon/config.yaml`

**a) `new_pipeline.enabled` set to `false`**

The Stream B library pipeline (watchdog-driven file intake from `_acquired/` and `_ingest/`) is disabled. This prevents the old pipeline from processing files while we build the replacement.

**b) `crawler.sites` set to `[]`**

All 44 crawl target site definitions commented out and preserved as historical reference. The crawler scheduler will find zero sites and do nothing if started.

**c) New `pipeline:` section added at end of file**

```yaml
pipeline:
  acquired_root: /opt/recon/data/acquired
  processing_root: /opt/recon/data/processing
  dispatch:
    pdf: pdf_processor
    stream: transcript_processor
    html: html_processor
  mtime_stability_seconds: 10
```

Scaffolding only — no code reads this section yet. Processors do not exist.

**Config diff stats:** 284 lines removed, 302 lines added (bulk is the 44 sites being commented/uncommented).

### 3. Schema: `text_dir` column added to `documents` table

```sql
ALTER TABLE documents ADD COLUMN text_dir TEXT;
```

All 29,812 existing rows have `text_dir = NULL`. This column will hold the path to each document's extracted text directory, replacing the convention-based `data/text/{hash}/` lookup.

---

## What Did Not Change

- **No code modified:** `recon.py`, `lib/`, `scripts/`, templates, static assets — all untouched
- **No data modified:** catalogue and documents row counts remain 29,812 each
- **No service state changed:** Both `recon.service` and `recon-watchdog.service` remain inactive (both still `enabled` — will auto-start on reboot)
- **No Qdrant changes:** Collection `recon_knowledge_hybrid` untouched (2,320,695 points)
- **No file moves or deletions:** Existing `data/text/`, `data/concepts/`, NFS mounts all untouched

---

## Verification (post-change)

| Check | Result |
|-------|--------|
| recon.service | inactive |
| recon-watchdog.service | inactive |
| catalogue rows | 29,812 |
| documents rows | 29,812 |
| text_dir NULL count | 29,812 (all rows) |
| new_pipeline.enabled | `false` |
| crawler.sites | `[]` |
| pipeline.acquired_root | `/opt/recon/data/acquired` |
| New directories exist | all 5 confirmed, zvx:zvx |
| YAML validates | yes |