matt/recon

mirror of https://github.com/zvx-echo6/recon.git synced 2026-05-20 06:34:40 +02:00

No description

Python 86.8%
HTML 6.1%
JavaScript 5.4%
CSS 1%
Shell 0.7%

Find a file

Matt f35af18320 feat(place): gate Google Places API calls behind auth Guest users receive local and cached data only. New Google Places API calls are only triggered for authenticated users, protecting against cost exploitation on the public navi.echo6.co frontend. The pattern: cached Google data flows freely (already paid for by an authed lookup). New API calls require X-Authentik-Username via get_user_id() check.		2026-04-26 03:36:21 +00:00
config	feat: add has_contours feature flags for home and regional_pi profiles	2026-04-26 03:36:16 +00:00
lib	feat(place): gate Google Places API calls behind auth	2026-04-26 03:36:21 +00:00
scripts	Add Overture Maps POI enrichment layer for place details	2026-04-21 16:51:25 +00:00
static	Replace wget/SingleFile/Playwright backends with Zimit	2026-04-19 14:06:23 +00:00
templates	Add Nav-I API key management UI	2026-04-23 06:50:44 +00:00
.gitignore	Extract _full_zim_cleanup helper, add SIGHUP + scrape_jobs cleanup	2026-04-19 02:28:49 +00:00
api.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
config.yaml	Replace wget/SingleFile/Playwright backends with Zimit	2026-04-19 14:06:23 +00:00
enricher.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
migrate_paths.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
PROJECT-BIBLE.md	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
README.md	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
recon.py	Add scraper Phase 2: smart crawl mode detection + browser fallback	2026-04-18 18:26:43 +00:00
requirements.txt	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
run-pipeline-now.sh	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
sweep_gated.sh	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00

README.md

RECON -- Knowledge Extraction Pipeline

Extracts structured knowledge from PDFs and web content into a Qdrant vector database for RAG retrieval by Aurora.

Quick Start

# Activate
cd /opt/recon && source venv/bin/activate

# Scan library for new PDFs
recon scan

# Queue and process
recon queue
recon extract
recon enrich
recon embed

# Or run full pipeline
recon run

# Ingest a web page
recon ingest-url "https://example.com/article" --category "Category" --process

# Crawl an entire docs site
recon crawl "https://docs.example.com" --include /docs/ --category "Category" --process

# Upload a PDF
recon upload --file /path/to/document.pdf --category "Category"

# Search
recon search "water purification methods"

# Check status
recon status
recon failures

Dashboard

http://100.64.0.24:8420

Services

Service	Location	Purpose
RECON Dashboard	recon:8420	Pipeline management + API
Qdrant	cortex:6333	Vector database
TEI	cortex:8090	Embeddings (1,711/sec)
Ollama	cortex:11434	Chat + fallback embeddings
OpenWebUI	cortex:8080 (ai.echo6.co)	Aurora chat with RAG
File Server	recon:8888 (files.echo6.co)	PDF downloads

Key Paths

Path	Contents
/opt/recon/	Application code
/opt/recon/data/concepts/	Gemini extractions (CRITICAL -- back these up)
/opt/recon/data/text/	Extracted text
/opt/recon/data/recon.db	SQLite status DB
/mnt/library/	PDF library (NFS from pi-nas)

Backups

Automated every 6 hours to Contabo VPS via /opt/recon/scripts/backup.sh. Concept JSONs are the most valuable data ($130+ of Gemini API work). Qdrant is NOT backed up -- rebuilt from JSONs in ~10 minutes via recon rebuild.

Monitoring

# Pipeline status
recon status

# Tail logs
tail -f /opt/recon/logs/recon.log

# Pipeline run log
tail -f /opt/recon/pipeline.log

# Validate consistency
recon validate --deep

Full Documentation

See PROJECT-BIBLE.md for complete system documentation.