# RECON Operations Runbook ## Service Info - **Host:** recon-vm (VM 131 on data node) — migrated from CT 130 on 2026-04-19 - **IP:** 192.168.1.130 / 100.64.0.24 - **Install:** /opt/recon/ - **User:** zvx - **Services:** `recon.service`, `recon-watchdog.service`, `kiwix.service` (systemd) ## Service Management ```bash ssh zvx@100.64.0.24 sudo systemctl start|stop|restart|status recon journalctl -u recon -f ``` ## Health Check ```bash curl -s http://100.64.0.24:8420/api/health | python3 -m json.tool # Returns: healthy (200), degraded/unhealthy (503) # Checks: Qdrant, TEI, NFS, Gemini keys, pipeline counts ``` ## Pipeline Status ```bash ssh zvx@100.64.0.24 cd /opt/recon && source venv/bin/activate python3 recon.py status # Summary counts python3 recon.py failures # Failed documents python3 recon.py search "query" # Test search ``` ## Dashboard - **URL:** http://100.64.0.24:8420 - Shows: pipeline progress, per-source breakdown, Qdrant stats - Auto-refreshes every 30s ## Common Operations ```bash cd /opt/recon && source venv/bin/activate # Add a PDF python3 recon.py upload --file /path/to.pdf --category "Reference" # Add web content python3 recon.py ingest-url "https://example.com/article" --process # Crawl a website python3 recon.py crawl "https://docs.example.com" --process # Manual pipeline run (normally automatic via service) python3 recon.py extract python3 recon.py enrich python3 recon.py embed # Scan library for new PDFs (normally hourly via service) python3 recon.py scan python3 recon.py queue ``` ## Dependencies | Service | Host | Port | Purpose | |---------|------|------|---------| | Qdrant | cortex | 6333 | Vector DB (recon_knowledge collection) | | TEI | cortex | 8090 | Text embeddings (bge-m3, 1024-dim) | | Ollama | cortex | 11434 | Chat model for Aurora RAG | | NFS | pi-nas | — | /mnt/library (PDF source) | | Gemini API | Google | — | Enrichment + vision OCR (4 keys in .env) | | Contabo VPS | 100.64.0.1 | — | Backup destination | ## Backups - **Destination:** `root@100.64.0.1:/opt/backups/recon/` - **Full sync (concepts, text, DB, config):** every 6 hours via cron - **DB snapshot only:** every 2 hours via cron - **Script:** `/opt/recon/scripts/backup.sh` ### Verify backups ```bash ssh root@100.64.0.1 'ls -lh /opt/backups/recon/recon_*.db && du -sh /opt/backups/recon/' ``` ## Troubleshooting ### Pipeline stalled (no progress) ```bash journalctl -u recon -n 50 # Check errors curl -s http://100.64.0.24:8420/api/health # Check dependencies sudo systemctl restart recon # Restart ``` ### Gemini rate limits (429 errors) Built-in: exponential backoff 5s→10s→20s→40s→80s with jitter. Window failures skip that window and continue — partial enrichment beats zero. If sustained: reduce `enrich_workers` in config.yaml, restart. ### Qdrant down ```bash ssh zvx@cortex docker ps | grep qdrant docker restart qdrant # If data lost: ssh zvx@100.64.0.24 'cd /opt/recon && source venv/bin/activate && python3 recon.py rebuild' ``` ### TEI down ```bash ssh zvx@cortex docker ps | grep tei docker restart tei ``` ### NFS mount lost ```bash ssh zvx@100.64.0.24 mount | grep library sudo mount -a sudo systemctl restart recon ``` ### Reset stuck documents ```bash cd /opt/recon && source venv/bin/activate # Find stuck transitional states sqlite3 data/recon.db "SELECT status, COUNT(*) FROM documents WHERE status IN ('extracting','enriching','embedding') GROUP BY status;" # Reset them sqlite3 data/recon.db "UPDATE documents SET status='queued' WHERE status='extracting';" sqlite3 data/recon.db "UPDATE documents SET status='extracted' WHERE status='enriching';" sqlite3 data/recon.db "UPDATE documents SET status='enriched' WHERE status='embedding';" ``` ### Full recovery from Contabo backup ```bash ssh zvx@100.64.0.24 sudo systemctl stop recon rsync -av root@100.64.0.1:/opt/backups/recon/concepts/ /opt/recon/data/concepts/ rsync -av root@100.64.0.1:/opt/backups/recon/text/ /opt/recon/data/text/ # Pick the latest DB backup rsync -av root@100.64.0.1:/opt/backups/recon/recon_latest.db /opt/recon/data/recon.db cd /opt/recon && source venv/bin/activate python3 recon.py rebuild # Rebuilds Qdrant from concept JSONs sudo systemctl start recon ``` ## Key Files | Path | Purpose | |------|---------| | `/opt/recon/config.yaml` | All configuration | | `/opt/recon/.env` | Gemini API keys (GEMINI_KEY_1 through GEMINI_KEY_4) | | `/opt/recon/data/recon.db` | SQLite status DB | | `/opt/recon/data/concepts/` | Gemini extraction results (CRITICAL — costs $ to regenerate) | | `/opt/recon/data/text/` | Extracted page text (regenerable from PDFs) | | `/opt/recon/PROJECT-BIBLE.md` | Full system documentation | | `/opt/recon/scripts/backup.sh` | Backup script | | `/opt/recon/scripts/validate.py` | Pipeline consistency checker | | `/opt/recon/scripts/rebuild_qdrant.py` | Nuclear Qdrant rebuild | ## Pipeline Architecture ``` /mnt/library/ (NFS) │ ▼ hourly scan [Catalogue] → [Queue] → [Extract] → [Enrich] → [Embed] → [Complete] 4 workers 16 workers 4 workers PyPDF2 Gemini TEI+Qdrant pdftotext 2.0 Flash bge-m3 Tesseract 1024-dim Gemini Vision ``` --- *Last updated: 2026-04-19 — Updated for CT 130 → VM 131 migration*