mirror of
https://github.com/zvx-echo6/recon.git
synced 2026-06-10 00:44:37 +02:00
No description
- Python 83.3%
- JavaScript 7.8%
- HTML 6.4%
- CSS 1.4%
- Shell 1.1%
HTTP wrapper over the wiki_index lookup so the (future) navi-places service can
fetch wiki enrichment over HTTP instead of reading recon's 2.1 GB
data/wiki_index.db directly (Phase A option B — HTTP coupling).
GET /api/wiki-enrich?wikidata=<Qid> (primary key)
GET /api/wiki-enrich?name=<name>&country=<cc> (fallback key)
-> 200 {wiki_summary?, wiki_population?, wiki_url?, wikivoyage_url?}
-> 400 if no usable key; 404 on no match. Public (no auth, like /api/place/*).
Route keys are wikidata_id / name+country — NOT osm_type/osm_id — because that
is how wiki_index is actually queried (the in-process _enrich_with_wiki_index
looks up by result['wikidata_id'] then name+country_code, never by OSM id; see
extraction-5-wiki-enrich-investigation.md). An osm-keyed route would have forced
a redundant in-recon place lookup.
Changes (additive only):
- lib/place_detail.py: new standalone lookup_wiki_index(wikidata_id, name,
country_code) doing the same two SELECTs + field/URL mapping as the
in-process path, returning a dict or None. Pure DB read, never raises.
`_enrich_with_wiki_index` is LEFT UNTOUCHED — it can be DRY-refactored to
delegate to this in a later PR; the in-process enrichment path is unchanged.
- lib/wiki_enrich_api.py: new wiki_enrich_bp blueprint with the route.
- lib/api.py: register the blueprint (one block).
- lib/wiki_enrich_api_test.py: 4 tests (hit-by-wikidata + decoded fields,
no-match -> 404, name+country fallback, no-key -> 400) over an in-memory
fixture DB; plain-assert style + __main__ runner (recon venv has no pytest).
Verified green against recon's venv (flask 3.1.2).
Does NOT remove the in-process _enrich_with_wiki_index call from place_detail —
that happens in a later PR once navi-places is live and serving.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| config | ||
| lib | ||
| scripts | ||
| static | ||
| templates | ||
| .gitignore | ||
| api.py | ||
| config.yaml | ||
| enricher.py | ||
| migrate_paths.py | ||
| PROJECT-BIBLE.md | ||
| README.md | ||
| recon.py | ||
| requirements.txt | ||
| run-pipeline-now.sh | ||
| sweep_gated.sh | ||
RECON -- Knowledge Extraction Pipeline
Extracts structured knowledge from PDFs and web content into a Qdrant vector database for RAG retrieval by Aurora.
Quick Start
# Activate
cd /opt/recon && source venv/bin/activate
# Scan library for new PDFs
recon scan
# Queue and process
recon queue
recon extract
recon enrich
recon embed
# Or run full pipeline
recon run
# Ingest a web page
recon ingest-url "https://example.com/article" --category "Category" --process
# Crawl an entire docs site
recon crawl "https://docs.example.com" --include /docs/ --category "Category" --process
# Upload a PDF
recon upload --file /path/to/document.pdf --category "Category"
# Search
recon search "water purification methods"
# Check status
recon status
recon failures
Dashboard
Services
| Service | Location | Purpose |
|---|---|---|
| RECON Dashboard | recon:8420 | Pipeline management + API |
| Qdrant | cortex:6333 | Vector database |
| TEI | cortex:8090 | Embeddings (1,711/sec) |
| Ollama | cortex:11434 | Chat + fallback embeddings |
| OpenWebUI | cortex:8080 (ai.echo6.co) | Aurora chat with RAG |
| File Server | recon:8888 (files.echo6.co) | PDF downloads |
Key Paths
| Path | Contents |
|---|---|
| /opt/recon/ | Application code |
| /opt/recon/data/concepts/ | Gemini extractions (CRITICAL -- back these up) |
| /opt/recon/data/text/ | Extracted text |
| /opt/recon/data/recon.db | SQLite status DB |
| /mnt/library/ | PDF library (NFS from pi-nas) |
Backups
Automated every 6 hours to Contabo VPS via /opt/recon/scripts/backup.sh.
Concept JSONs are the most valuable data ($130+ of Gemini API work).
Qdrant is NOT backed up -- rebuilt from JSONs in ~10 minutes via recon rebuild.
Monitoring
# Pipeline status
recon status
# Tail logs
tail -f /opt/recon/logs/recon.log
# Pipeline run log
tail -f /opt/recon/pipeline.log
# Validate consistency
recon validate --deep
Full Documentation
See PROJECT-BIBLE.md for complete system documentation.