Files changed: docs/hardware/environment.md docs/services/services.md runbooks/recon-operations.md runbooks/recon-service-integration.md
5.4 KiB
5.4 KiB
RECON Operations Runbook
Service Info
- Host: recon-vm (VM 131 on data node) — migrated from CT 130 on 2026-04-19
- IP: 192.168.1.130 / 100.64.0.24
- Install: /opt/recon/
- User: zvx
- Services:
recon.service,recon-watchdog.service,kiwix.service(systemd)
Service Management
ssh zvx@100.64.0.24
sudo systemctl start|stop|restart|status recon
journalctl -u recon -f
Health Check
curl -s http://100.64.0.24:8420/api/health | python3 -m json.tool
# Returns: healthy (200), degraded/unhealthy (503)
# Checks: Qdrant, TEI, NFS, Gemini keys, pipeline counts
Pipeline Status
ssh zvx@100.64.0.24
cd /opt/recon && source venv/bin/activate
python3 recon.py status # Summary counts
python3 recon.py failures # Failed documents
python3 recon.py search "query" # Test search
Dashboard
- URL: http://100.64.0.24:8420
- Shows: pipeline progress, per-source breakdown, Qdrant stats
- Auto-refreshes every 30s
Common Operations
cd /opt/recon && source venv/bin/activate
# Add a PDF
python3 recon.py upload --file /path/to.pdf --category "Reference"
# Add web content
python3 recon.py ingest-url "https://example.com/article" --process
# Crawl a website
python3 recon.py crawl "https://docs.example.com" --process
# Manual pipeline run (normally automatic via service)
python3 recon.py extract
python3 recon.py enrich
python3 recon.py embed
# Scan library for new PDFs (normally hourly via service)
python3 recon.py scan
python3 recon.py queue
Dependencies
| Service | Host | Port | Purpose |
|---|---|---|---|
| Qdrant | cortex | 6333 | Vector DB (recon_knowledge collection) |
| TEI | cortex | 8090 | Text embeddings (bge-m3, 1024-dim) |
| Ollama | cortex | 11434 | Chat model for Aurora RAG |
| NFS | pi-nas | — | /mnt/library (PDF source) |
| Gemini API | — | Enrichment + vision OCR (4 keys in .env) | |
| Contabo VPS | 100.64.0.1 | — | Backup destination |
Backups
- Destination:
root@100.64.0.1:/opt/backups/recon/ - Full sync (concepts, text, DB, config): every 6 hours via cron
- DB snapshot only: every 2 hours via cron
- Script:
/opt/recon/scripts/backup.sh
Verify backups
ssh root@100.64.0.1 'ls -lh /opt/backups/recon/recon_*.db && du -sh /opt/backups/recon/'
Troubleshooting
Pipeline stalled (no progress)
journalctl -u recon -n 50 # Check errors
curl -s http://100.64.0.24:8420/api/health # Check dependencies
sudo systemctl restart recon # Restart
Gemini rate limits (429 errors)
Built-in: exponential backoff 5s→10s→20s→40s→80s with jitter. Window failures skip that window and continue — partial enrichment beats zero.
If sustained: reduce enrich_workers in config.yaml, restart.
Qdrant down
ssh zvx@cortex
docker ps | grep qdrant
docker restart qdrant
# If data lost: ssh zvx@100.64.0.24 'cd /opt/recon && source venv/bin/activate && python3 recon.py rebuild'
TEI down
ssh zvx@cortex
docker ps | grep tei
docker restart tei
NFS mount lost
ssh zvx@100.64.0.24
mount | grep library
sudo mount -a
sudo systemctl restart recon
Reset stuck documents
cd /opt/recon && source venv/bin/activate
# Find stuck transitional states
sqlite3 data/recon.db "SELECT status, COUNT(*) FROM documents WHERE status IN ('extracting','enriching','embedding') GROUP BY status;"
# Reset them
sqlite3 data/recon.db "UPDATE documents SET status='queued' WHERE status='extracting';"
sqlite3 data/recon.db "UPDATE documents SET status='extracted' WHERE status='enriching';"
sqlite3 data/recon.db "UPDATE documents SET status='enriched' WHERE status='embedding';"
Full recovery from Contabo backup
ssh zvx@100.64.0.24
sudo systemctl stop recon
rsync -av root@100.64.0.1:/opt/backups/recon/concepts/ /opt/recon/data/concepts/
rsync -av root@100.64.0.1:/opt/backups/recon/text/ /opt/recon/data/text/
# Pick the latest DB backup
rsync -av root@100.64.0.1:/opt/backups/recon/recon_latest.db /opt/recon/data/recon.db
cd /opt/recon && source venv/bin/activate
python3 recon.py rebuild # Rebuilds Qdrant from concept JSONs
sudo systemctl start recon
Key Files
| Path | Purpose |
|---|---|
/opt/recon/config.yaml |
All configuration |
/opt/recon/.env |
Gemini API keys (GEMINI_KEY_1 through GEMINI_KEY_4) |
/opt/recon/data/recon.db |
SQLite status DB |
/opt/recon/data/concepts/ |
Gemini extraction results (CRITICAL — costs $ to regenerate) |
/opt/recon/data/text/ |
Extracted page text (regenerable from PDFs) |
/opt/recon/PROJECT-BIBLE.md |
Full system documentation |
/opt/recon/scripts/backup.sh |
Backup script |
/opt/recon/scripts/validate.py |
Pipeline consistency checker |
/opt/recon/scripts/rebuild_qdrant.py |
Nuclear Qdrant rebuild |
Pipeline Architecture
/mnt/library/ (NFS)
│
▼ hourly scan
[Catalogue] → [Queue] → [Extract] → [Enrich] → [Embed] → [Complete]
4 workers 16 workers 4 workers
PyPDF2 Gemini TEI+Qdrant
pdftotext 2.0 Flash bge-m3
Tesseract 1024-dim
Gemini Vision
Last updated: 2026-04-19 — Updated for CT 130 → VM 131 migration