echo6-docs/runbooks/recon-operations.md
echo6-autocommit 19ff1c7e79 auto: docs sync 2026-04-19T06:00:07+00:00
Files changed: docs/hardware/environment.md docs/services/services.md runbooks/recon-operations.md runbooks/recon-service-integration.md
2026-04-19 06:00:07 +00:00

5.4 KiB

RECON Operations Runbook

Service Info

  • Host: recon-vm (VM 131 on data node) — migrated from CT 130 on 2026-04-19
  • IP: 192.168.1.130 / 100.64.0.24
  • Install: /opt/recon/
  • User: zvx
  • Services: recon.service, recon-watchdog.service, kiwix.service (systemd)

Service Management

ssh zvx@100.64.0.24
sudo systemctl start|stop|restart|status recon
journalctl -u recon -f

Health Check

curl -s http://100.64.0.24:8420/api/health | python3 -m json.tool
# Returns: healthy (200), degraded/unhealthy (503)
# Checks: Qdrant, TEI, NFS, Gemini keys, pipeline counts

Pipeline Status

ssh zvx@100.64.0.24
cd /opt/recon && source venv/bin/activate
python3 recon.py status            # Summary counts
python3 recon.py failures          # Failed documents
python3 recon.py search "query"    # Test search

Dashboard

  • URL: http://100.64.0.24:8420
  • Shows: pipeline progress, per-source breakdown, Qdrant stats
  • Auto-refreshes every 30s

Common Operations

cd /opt/recon && source venv/bin/activate

# Add a PDF
python3 recon.py upload --file /path/to.pdf --category "Reference"

# Add web content
python3 recon.py ingest-url "https://example.com/article" --process

# Crawl a website
python3 recon.py crawl "https://docs.example.com" --process

# Manual pipeline run (normally automatic via service)
python3 recon.py extract
python3 recon.py enrich
python3 recon.py embed

# Scan library for new PDFs (normally hourly via service)
python3 recon.py scan
python3 recon.py queue

Dependencies

Service Host Port Purpose
Qdrant cortex 6333 Vector DB (recon_knowledge collection)
TEI cortex 8090 Text embeddings (bge-m3, 1024-dim)
Ollama cortex 11434 Chat model for Aurora RAG
NFS pi-nas /mnt/library (PDF source)
Gemini API Google Enrichment + vision OCR (4 keys in .env)
Contabo VPS 100.64.0.1 Backup destination

Backups

  • Destination: root@100.64.0.1:/opt/backups/recon/
  • Full sync (concepts, text, DB, config): every 6 hours via cron
  • DB snapshot only: every 2 hours via cron
  • Script: /opt/recon/scripts/backup.sh

Verify backups

ssh root@100.64.0.1 'ls -lh /opt/backups/recon/recon_*.db && du -sh /opt/backups/recon/'

Troubleshooting

Pipeline stalled (no progress)

journalctl -u recon -n 50                      # Check errors
curl -s http://100.64.0.24:8420/api/health      # Check dependencies
sudo systemctl restart recon                    # Restart

Gemini rate limits (429 errors)

Built-in: exponential backoff 5s→10s→20s→40s→80s with jitter. Window failures skip that window and continue — partial enrichment beats zero.

If sustained: reduce enrich_workers in config.yaml, restart.

Qdrant down

ssh zvx@cortex
docker ps | grep qdrant
docker restart qdrant
# If data lost: ssh zvx@100.64.0.24 'cd /opt/recon && source venv/bin/activate && python3 recon.py rebuild'

TEI down

ssh zvx@cortex
docker ps | grep tei
docker restart tei

NFS mount lost

ssh zvx@100.64.0.24
mount | grep library
sudo mount -a
sudo systemctl restart recon

Reset stuck documents

cd /opt/recon && source venv/bin/activate
# Find stuck transitional states
sqlite3 data/recon.db "SELECT status, COUNT(*) FROM documents WHERE status IN ('extracting','enriching','embedding') GROUP BY status;"
# Reset them
sqlite3 data/recon.db "UPDATE documents SET status='queued' WHERE status='extracting';"
sqlite3 data/recon.db "UPDATE documents SET status='extracted' WHERE status='enriching';"
sqlite3 data/recon.db "UPDATE documents SET status='enriched' WHERE status='embedding';"

Full recovery from Contabo backup

ssh zvx@100.64.0.24
sudo systemctl stop recon
rsync -av root@100.64.0.1:/opt/backups/recon/concepts/ /opt/recon/data/concepts/
rsync -av root@100.64.0.1:/opt/backups/recon/text/ /opt/recon/data/text/
# Pick the latest DB backup
rsync -av root@100.64.0.1:/opt/backups/recon/recon_latest.db /opt/recon/data/recon.db
cd /opt/recon && source venv/bin/activate
python3 recon.py rebuild       # Rebuilds Qdrant from concept JSONs
sudo systemctl start recon

Key Files

Path Purpose
/opt/recon/config.yaml All configuration
/opt/recon/.env Gemini API keys (GEMINI_KEY_1 through GEMINI_KEY_4)
/opt/recon/data/recon.db SQLite status DB
/opt/recon/data/concepts/ Gemini extraction results (CRITICAL — costs $ to regenerate)
/opt/recon/data/text/ Extracted page text (regenerable from PDFs)
/opt/recon/PROJECT-BIBLE.md Full system documentation
/opt/recon/scripts/backup.sh Backup script
/opt/recon/scripts/validate.py Pipeline consistency checker
/opt/recon/scripts/rebuild_qdrant.py Nuclear Qdrant rebuild

Pipeline Architecture

/mnt/library/ (NFS)
    │
    ▼ hourly scan
[Catalogue] → [Queue] → [Extract] → [Enrich] → [Embed] → [Complete]
                          4 workers    16 workers  4 workers
                          PyPDF2       Gemini      TEI+Qdrant
                          pdftotext    2.0 Flash   bge-m3
                          Tesseract                1024-dim
                          Gemini Vision

Last updated: 2026-04-19 — Updated for CT 130 → VM 131 migration