matt/recon

mirror of https://github.com/zvx-echo6/recon.git synced 2026-06-10 00:44:37 +02:00

No description

Python 83.3%
JavaScript 7.8%
HTML 6.4%
CSS 1.4%
Shell 1.1%

Find a file

malice f42b1fef3b recon: add /api/wiki-enrich endpoint (extraction #5 prep, additive) (#8 ) HTTP wrapper over the wiki_index lookup so the (future) navi-places service can fetch wiki enrichment over HTTP instead of reading recon's 2.1 GB data/wiki_index.db directly (Phase A option B — HTTP coupling). GET /api/wiki-enrich?wikidata=<Qid> (primary key) GET /api/wiki-enrich?name=<name>&country=<cc> (fallback key) -> 200 {wiki_summary?, wiki_population?, wiki_url?, wikivoyage_url?} -> 400 if no usable key; 404 on no match. Public (no auth, like /api/place/*). Route keys are wikidata_id / name+country — NOT osm_type/osm_id — because that is how wiki_index is actually queried (the in-process _enrich_with_wiki_index looks up by result['wikidata_id'] then name+country_code, never by OSM id; see extraction-5-wiki-enrich-investigation.md). An osm-keyed route would have forced a redundant in-recon place lookup. Changes (additive only): - lib/place_detail.py: new standalone lookup_wiki_index(wikidata_id, name, country_code) doing the same two SELECTs + field/URL mapping as the in-process path, returning a dict or None. Pure DB read, never raises. `_enrich_with_wiki_index` is LEFT UNTOUCHED — it can be DRY-refactored to delegate to this in a later PR; the in-process enrichment path is unchanged. - lib/wiki_enrich_api.py: new wiki_enrich_bp blueprint with the route. - lib/api.py: register the blueprint (one block). - lib/wiki_enrich_api_test.py: 4 tests (hit-by-wikidata + decoded fields, no-match -> 404, name+country fallback, no-key -> 400) over an in-memory fixture DB; plain-assert style + __main__ runner (recon venv has no pytest). Verified green against recon's venv (flask 3.1.2). Does NOT remove the in-process _enrich_with_wiki_index call from place_detail — that happens in a later PR once navi-places is live and serving. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-22 13:23:08 -06:00
config	recon: add auth.login_url/logout_url to deployment profiles (extraction #2 )	2026-05-22 08:10:33 -06:00
lib	recon: add /api/wiki-enrich endpoint (extraction #5 prep, additive) (#8 )	2026-05-22 13:23:08 -06:00
scripts	Add Overture Maps POI enrichment layer for place details	2026-04-21 16:51:25 +00:00
static	Replace wget/SingleFile/Playwright backends with Zimit	2026-04-19 14:06:23 +00:00
templates	Add Nav-I API key management UI	2026-04-23 06:50:44 +00:00
.gitignore	Extract _full_zim_cleanup helper, add SIGHUP + scrape_jobs cleanup	2026-04-19 02:28:49 +00:00
api.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
config.yaml	Replace wget/SingleFile/Playwright backends with Zimit	2026-04-19 14:06:23 +00:00
enricher.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
migrate_paths.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
PROJECT-BIBLE.md	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
README.md	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
recon.py	Add scraper Phase 2: smart crawl mode detection + browser fallback	2026-04-18 18:26:43 +00:00
requirements.txt	Add /api/reverse/<lat>/<lon> localhost-sourced enrichment bundle	2026-05-20 05:33:45 +00:00
run-pipeline-now.sh	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
sweep_gated.sh	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00

README.md

RECON -- Knowledge Extraction Pipeline

Extracts structured knowledge from PDFs and web content into a Qdrant vector database for RAG retrieval by Aurora.

Quick Start

# Activate
cd /opt/recon && source venv/bin/activate

# Scan library for new PDFs
recon scan

# Queue and process
recon queue
recon extract
recon enrich
recon embed

# Or run full pipeline
recon run

# Ingest a web page
recon ingest-url "https://example.com/article" --category "Category" --process

# Crawl an entire docs site
recon crawl "https://docs.example.com" --include /docs/ --category "Category" --process

# Upload a PDF
recon upload --file /path/to/document.pdf --category "Category"

# Search
recon search "water purification methods"

# Check status
recon status
recon failures

Dashboard

http://100.64.0.24:8420

Services

Service	Location	Purpose
RECON Dashboard	recon:8420	Pipeline management + API
Qdrant	cortex:6333	Vector database
TEI	cortex:8090	Embeddings (1,711/sec)
Ollama	cortex:11434	Chat + fallback embeddings
OpenWebUI	cortex:8080 (ai.echo6.co)	Aurora chat with RAG
File Server	recon:8888 (files.echo6.co)	PDF downloads

Key Paths

Path	Contents
/opt/recon/	Application code
/opt/recon/data/concepts/	Gemini extractions (CRITICAL -- back these up)
/opt/recon/data/text/	Extracted text
/opt/recon/data/recon.db	SQLite status DB
/mnt/library/	PDF library (NFS from pi-nas)

Backups

Automated every 6 hours to Contabo VPS via /opt/recon/scripts/backup.sh. Concept JSONs are the most valuable data ($130+ of Gemini API work). Qdrant is NOT backed up -- rebuilt from JSONs in ~10 minutes via recon rebuild.

Monitoring

# Pipeline status
recon status

# Tail logs
tail -f /opt/recon/logs/recon.log

# Pipeline run log
tail -f /opt/recon/pipeline.log

# Validate consistency
recon validate --deep

Full Documentation

See PROJECT-BIBLE.md for complete system documentation.