refactored-recon/phases/phase-1-scaffolding.md
Matt b697404df2 Phase 1: scaffolding (directories, config, text_dir column)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 14:48:27 +00:00

2.9 KiB

Phase 1: Scaffolding

Executed: 2026-04-14T14:45Z UTC


Backups (taken before changes)

Item Location MD5 Hash
recon.db CT 130: /tmp/recon.db.phase1.20260414.bak 69d94a2c21686871c8c6863903710e3f
config.yaml CT 130: /tmp/config.yaml.phase1.20260414.bak 6d70ed572dfb2e704abca3850ae33797

DB hash matches Phase 0 backup — no changes occurred between phases.


What Changed

1. Filesystem: New directory tree

Created under /opt/recon/data/:

acquired/
  README.md
  pdf/.keep
  stream/.keep
  html/.keep
processing/
  README.md

All owned by zvx:zvx, matching the existing data directory.

2. Config: Three edits to /opt/recon/config.yaml

a) new_pipeline.enabled set to false

The Stream B library pipeline (watchdog-driven file intake from _acquired/ and _ingest/) is disabled. This prevents the old pipeline from processing files while we build the replacement.

b) crawler.sites set to []

All 44 crawl target site definitions commented out and preserved as historical reference. The crawler scheduler will find zero sites and do nothing if started.

c) New pipeline: section added at end of file

pipeline:
  acquired_root: /opt/recon/data/acquired
  processing_root: /opt/recon/data/processing
  dispatch:
    pdf: pdf_processor
    stream: transcript_processor
    html: html_processor
  mtime_stability_seconds: 10

Scaffolding only — no code reads this section yet. Processors do not exist.

Config diff stats: 284 lines removed, 302 lines added (bulk is the 44 sites being commented/uncommented).

3. Schema: text_dir column added to documents table

ALTER TABLE documents ADD COLUMN text_dir TEXT;

All 29,812 existing rows have text_dir = NULL. This column will hold the path to each document's extracted text directory, replacing the convention-based data/text/{hash}/ lookup.


What Did Not Change

  • No code modified: recon.py, lib/, scripts/, templates, static assets — all untouched
  • No data modified: catalogue and documents row counts remain 29,812 each
  • No service state changed: Both recon.service and recon-watchdog.service remain inactive (both still enabled — will auto-start on reboot)
  • No Qdrant changes: Collection recon_knowledge_hybrid untouched (2,320,695 points)
  • No file moves or deletions: Existing data/text/, data/concepts/, NFS mounts all untouched

Verification (post-change)

Check Result
recon.service inactive
recon-watchdog.service inactive
catalogue rows 29,812
documents rows 29,812
text_dir NULL count 29,812 (all rows)
new_pipeline.enabled false
crawler.sites []
pipeline.acquired_root /opt/recon/data/acquired
New directories exist all 5 confirmed, zvx:zvx
YAML validates yes