diff --git a/PROJECT-BIBLE.md b/PROJECT-BIBLE.md index 666dc12..7e8402e 100644 --- a/PROJECT-BIBLE.md +++ b/PROJECT-BIBLE.md @@ -37,10 +37,10 @@ Search returns page-grounded citations back to the file or stream URL. ``` ┌─────────────────────────┐ │ CT 130 (recon) │ - Library (NFS) │ /opt/recon/ │ ┌──────────────┐ - pi-nas:/export/library│ ├─ data/ │ │ Qdrant │ + Library (LXC bind) │ /opt/recon/ │ ┌──────────────┐ + data /mnt/data/library│ ├─ data/ │ │ Qdrant │ → /mnt/library/ │ │ ├─ acquired/ │ │ cortex:6333 │ - (read/write) │ │ ├─ processing/ │ ←→ │ recon_knowledge_hybrid + (read/write, local SSD)│ │ ├─ processing/ │ ←→ │ recon_knowledge_hybrid │ │ ├─ concepts/ │ │ (1024-d dense + sparse) │ │ └─ recon.db │ └──────────────┘ │ ├─ lib/ │ @@ -358,7 +358,7 @@ Lives at `/opt/recon/config.yaml`. Secrets (`GEMINI_KEYS`, ### Top-level keys | Key | Meaning | |---|---| -| `library_root` | `/mnt/library` — NFS mount root | +| `library_root` | `/mnt/library` — LXC bind-mount root (data host `/mnt/data/library`, local SSD) | | `processing` | Worker counts, window sizes, timeouts, retry policy | | `embedding` | TEI host/port, model (`bge-m3`), 1024-d dense | | `sparse_embedding` | Separate service on cortex:8091 | @@ -481,7 +481,7 @@ and knowledge-stats panels so the dashboard loads instantly. │ └── organizer.py # determine_dominant_domain, level 1-4 naming └── logs/ -/mnt/library/ # NFS from pi-nas, read-write +/mnt/library/ # LXC bind-mount from data host /mnt/data/library (local SSD), read-write ├── //. └── _acquired/ _review/ _staging/ signal-archive/ # not touched by pipeline ``` @@ -521,6 +521,13 @@ implementations are in the RECON repo; design lives here. - 18,855 transcripts in `/mnt/library/_sources/streamecho6/`. - Old stream-B `new_pipeline` ran off `/mnt/library/_acquired/`. - `scan_library()` polled the NFS mount for new PDFs — now deprecated. +- *Storage migration note:* `/mnt/library` was historically an NFS + mount from `pi-nas:/export/library`, which is what `current-state.md` + and `scan_library()` were written against. The library has since + been migrated to local SSD on the data Proxmox host + (`/mnt/data/library`) and surfaced into CT 130 via an LXC + bind-mount. The pi-nas copy was wiped on 2026-04-15. Path strings + inside the codebase didn't change; only the underlying storage did. --- @@ -537,7 +544,8 @@ tail -f /opt/recon/logs/recon.log ```bash # Local DB backup before risky operations cp /opt/recon/data/recon.db /tmp/recon.db.bak.$(date +%s) -# Contabo offsite (automatic): rsync every 6 hours, see recon-backup.timer +# Offsite backup: planned, not yet configured (TBD — likely rsync to +# pi-nas:/export/recon-backup once a backup target is provisioned). ``` ### Inspect pipeline state at a glance @@ -590,6 +598,11 @@ curl -s http://100.64.0.14:6333/collections/recon_knowledge_hybrid \ filter excludes them. - **PeerTube 429.** Respect `peertube.rate_limit_delay` between caption API calls or you'll get throttled. +- **Library is an LXC bind-mount, not NFS.** `/mnt/library` on CT 130 is + bound from the data Proxmox host's `/mnt/data/library` (local ext4 on + /dev/sda1). File ownership/UID-GID is shared with the host — writes + from inside the container appear with the container UID on the host. + No NFS, no `root_squash`, no network in the path. - **SSH heredocs with Python code break.** When editing remote files, write to a temp file via `scp` or `cat > file` rather than bash heredocs with parens/quotes. @@ -603,9 +616,10 @@ curl -s http://100.64.0.14:6333/collections/recon_knowledge_hybrid \ | Host | Role | Access | |---|---|---| | CT 130 (192.168.1.130 / 100.64.0.24) | RECON service | `ssh zvx@192.168.1.130` (key auth) | +| data host (192.168.1.240) | Proxmox node hosting CT 130; `/mnt/data/library` source for the CT 130 bind-mount | `ssh root@192.168.1.240` | | cortex VM (192.168.1.150) | Qdrant, TEI, sparse svc, Ollama | `ssh zvx@cortex` | | CT 110 (192.168.1.170) | PeerTube `stream.echo6.co` | `ssh zvx@192.168.1.170` | -| pi-nas (192.168.1.245) | NFS server for `/mnt/library` | `ssh zvx@pi-nas` | +| pi-nas (192.168.1.245) | Backup target (planned; not yet configured). ~22T pool with ~300G free after library wipe. | `ssh zvx@pi-nas` | | CT 101 (192.168.1.101) | Caddy reverse proxy (home) | `ssh root@192.168.1.241 'pct exec 101'` | Secrets: `/home/zvx/projects/.ref/credentials` on TOC (this machine). @@ -644,6 +658,13 @@ RECON Gemini/PeerTube keys: `/opt/recon/.env` on CT 130. Qdrant snapshots are not in any backup rotation. If CT 130 or cortex lose their disks, these are the hardest to regenerate (Gemini calls + embedding compute). +- **Backup architecture** — no offsite backup is currently configured. + Section 15 references a planned rsync-to-pi-nas job, but neither the + script nor the systemd timer (`recon-backup.timer`) exist. Decide + what gets backed up (`recon.db`, `concepts/`, `text/`, Qdrant + snapshots, `/mnt/library`?), where, and on what cadence; pi-nas has + ~300G free in `/export/` after the 2026-04-15 library wipe and could + be the target for a first pass. - **`signal-archive/` in `/mnt/library/`** — 44 Signal/Matrix chat log files, not library content. Matt intends these to "eventually contribute" to the knowledge base but no ingestion path exists yet.