No description
  • Python 86.8%
  • HTML 6.1%
  • JavaScript 5.4%
  • CSS 1%
  • Shell 0.7%
Find a file
Matt a40ce47127 Fix progress column to show Qdrant count for completed sources
Complete sources now show "19,344 in Qdrant" instead of misleading
extraction counts. Each status gets contextual progress display:
complete → X in Qdrant, processing → X/Y in Qdrant (%),
extracting → X/Y extracted, detected → dash.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-17 15:31:01 +00:00
lib Fix Kiwix status badges to reflect full pipeline state 2026-04-17 15:22:44 +00:00
scripts Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
static Fix progress column to show Qdrant count for completed sources 2026-04-17 15:31:01 +00:00
templates Kiwix integration: ZIM processor, dashboard tab, wiki.echo6.co citations 2026-04-17 07:00:24 +00:00
.gitignore Phase 1: Kiwix foundation — ZIM monitor and kiwix-serve setup 2026-04-16 23:39:34 +00:00
api.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
config.yaml Add langdetect language filter to enricher + purge non-English ZIM articles 2026-04-17 14:37:13 +00:00
enricher.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
migrate_paths.py Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
PROJECT-BIBLE.md Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
README.md Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
recon.py Phase 6d: PeerTube acquisition module + service thread 2026-04-15 03:08:51 +00:00
requirements.txt Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
run-pipeline-now.sh Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00
sweep_gated.sh Initial commit: RECON codebase baseline 2026-04-14 14:57:23 +00:00

RECON -- Knowledge Extraction Pipeline

Extracts structured knowledge from PDFs and web content into a Qdrant vector database for RAG retrieval by Aurora.

Quick Start

# Activate
cd /opt/recon && source venv/bin/activate

# Scan library for new PDFs
recon scan

# Queue and process
recon queue
recon extract
recon enrich
recon embed

# Or run full pipeline
recon run

# Ingest a web page
recon ingest-url "https://example.com/article" --category "Category" --process

# Crawl an entire docs site
recon crawl "https://docs.example.com" --include /docs/ --category "Category" --process

# Upload a PDF
recon upload --file /path/to/document.pdf --category "Category"

# Search
recon search "water purification methods"

# Check status
recon status
recon failures

Dashboard

http://100.64.0.24:8420

Services

Service Location Purpose
RECON Dashboard recon:8420 Pipeline management + API
Qdrant cortex:6333 Vector database
TEI cortex:8090 Embeddings (1,711/sec)
Ollama cortex:11434 Chat + fallback embeddings
OpenWebUI cortex:8080 (ai.echo6.co) Aurora chat with RAG
File Server recon:8888 (files.echo6.co) PDF downloads

Key Paths

Path Contents
/opt/recon/ Application code
/opt/recon/data/concepts/ Gemini extractions (CRITICAL -- back these up)
/opt/recon/data/text/ Extracted text
/opt/recon/data/recon.db SQLite status DB
/mnt/library/ PDF library (NFS from pi-nas)

Backups

Automated every 6 hours to Contabo VPS via /opt/recon/scripts/backup.sh. Concept JSONs are the most valuable data ($130+ of Gemini API work). Qdrant is NOT backed up -- rebuilt from JSONs in ~10 minutes via recon rebuild.

Monitoring

# Pipeline status
recon status

# Tail logs
tail -f /opt/recon/logs/recon.log

# Pipeline run log
tail -f /opt/recon/pipeline.log

# Validate consistency
recon validate --deep

Full Documentation

See PROJECT-BIBLE.md for complete system documentation.