mirror of
https://github.com/zvx-echo6/refactored-recon.git
synced 2026-05-20 14:44:39 +02:00
96 lines
2.9 KiB
Markdown
96 lines
2.9 KiB
Markdown
|
|
# Phase 1: Scaffolding
|
||
|
|
|
||
|
|
**Executed:** 2026-04-14T14:45Z UTC
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Backups (taken before changes)
|
||
|
|
|
||
|
|
| Item | Location | MD5 Hash |
|
||
|
|
|------|----------|----------|
|
||
|
|
| recon.db | CT 130: `/tmp/recon.db.phase1.20260414.bak` | `69d94a2c21686871c8c6863903710e3f` |
|
||
|
|
| config.yaml | CT 130: `/tmp/config.yaml.phase1.20260414.bak` | `6d70ed572dfb2e704abca3850ae33797` |
|
||
|
|
|
||
|
|
DB hash matches Phase 0 backup — no changes occurred between phases.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## What Changed
|
||
|
|
|
||
|
|
### 1. Filesystem: New directory tree
|
||
|
|
|
||
|
|
Created under `/opt/recon/data/`:
|
||
|
|
|
||
|
|
```
|
||
|
|
acquired/
|
||
|
|
README.md
|
||
|
|
pdf/.keep
|
||
|
|
stream/.keep
|
||
|
|
html/.keep
|
||
|
|
processing/
|
||
|
|
README.md
|
||
|
|
```
|
||
|
|
|
||
|
|
All owned by `zvx:zvx`, matching the existing data directory.
|
||
|
|
|
||
|
|
### 2. Config: Three edits to `/opt/recon/config.yaml`
|
||
|
|
|
||
|
|
**a) `new_pipeline.enabled` set to `false`**
|
||
|
|
|
||
|
|
The Stream B library pipeline (watchdog-driven file intake from `_acquired/` and `_ingest/`) is disabled. This prevents the old pipeline from processing files while we build the replacement.
|
||
|
|
|
||
|
|
**b) `crawler.sites` set to `[]`**
|
||
|
|
|
||
|
|
All 44 crawl target site definitions commented out and preserved as historical reference. The crawler scheduler will find zero sites and do nothing if started.
|
||
|
|
|
||
|
|
**c) New `pipeline:` section added at end of file**
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
pipeline:
|
||
|
|
acquired_root: /opt/recon/data/acquired
|
||
|
|
processing_root: /opt/recon/data/processing
|
||
|
|
dispatch:
|
||
|
|
pdf: pdf_processor
|
||
|
|
stream: transcript_processor
|
||
|
|
html: html_processor
|
||
|
|
mtime_stability_seconds: 10
|
||
|
|
```
|
||
|
|
|
||
|
|
Scaffolding only — no code reads this section yet. Processors do not exist.
|
||
|
|
|
||
|
|
**Config diff stats:** 284 lines removed, 302 lines added (bulk is the 44 sites being commented/uncommented).
|
||
|
|
|
||
|
|
### 3. Schema: `text_dir` column added to `documents` table
|
||
|
|
|
||
|
|
```sql
|
||
|
|
ALTER TABLE documents ADD COLUMN text_dir TEXT;
|
||
|
|
```
|
||
|
|
|
||
|
|
All 29,812 existing rows have `text_dir = NULL`. This column will hold the path to each document's extracted text directory, replacing the convention-based `data/text/{hash}/` lookup.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## What Did Not Change
|
||
|
|
|
||
|
|
- **No code modified:** `recon.py`, `lib/`, `scripts/`, templates, static assets — all untouched
|
||
|
|
- **No data modified:** catalogue and documents row counts remain 29,812 each
|
||
|
|
- **No service state changed:** Both `recon.service` and `recon-watchdog.service` remain inactive (both still `enabled` — will auto-start on reboot)
|
||
|
|
- **No Qdrant changes:** Collection `recon_knowledge_hybrid` untouched (2,320,695 points)
|
||
|
|
- **No file moves or deletions:** Existing `data/text/`, `data/concepts/`, NFS mounts all untouched
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Verification (post-change)
|
||
|
|
|
||
|
|
| Check | Result |
|
||
|
|
|-------|--------|
|
||
|
|
| recon.service | inactive |
|
||
|
|
| recon-watchdog.service | inactive |
|
||
|
|
| catalogue rows | 29,812 |
|
||
|
|
| documents rows | 29,812 |
|
||
|
|
| text_dir NULL count | 29,812 (all rows) |
|
||
|
|
| new_pipeline.enabled | `false` |
|
||
|
|
| crawler.sites | `[]` |
|
||
|
|
| pipeline.acquired_root | `/opt/recon/data/acquired` |
|
||
|
|
| New directories exist | all 5 confirmed, zvx:zvx |
|
||
|
|
| YAML validates | yes |
|