mirror of
https://github.com/zvx-echo6/recon.git
synced 2026-05-20 06:34:40 +02:00
No description
- Python 86.8%
- HTML 6.1%
- JavaScript 5.4%
- CSS 1%
- Shell 0.7%
Two bugs in the Recently Completed table: 1. Title showed "Untitled" for all transcripts because the dashboard read documents.book_title (populated by PDF metadata voting) which is NULL for transcripts. Fixed by COALESCE(book_title, filename) in the SQL query -- falls back to catalogue.filename which holds the real video title. 2. Type showed "WEB" for all transcripts because the type CASE expression only had web and pdf branches, with web matching any http% path -- and transcript paths are PeerTube watch URLs. Fixed by adding a transcript branch keyed on catalogue.source = stream.echo6.co, evaluated before the web branch. Also adds badge-transcript CSS (purple) and JS rendering case. Applied consistently to both the Recently Completed and Sources table queries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| lib | ||
| scripts | ||
| static | ||
| templates | ||
| .gitignore | ||
| api.py | ||
| config.yaml | ||
| enricher.py | ||
| migrate_paths.py | ||
| PROJECT-BIBLE.md | ||
| README.md | ||
| recon.py | ||
| requirements.txt | ||
| run-pipeline-now.sh | ||
| sweep_gated.sh | ||
RECON -- Knowledge Extraction Pipeline
Extracts structured knowledge from PDFs and web content into a Qdrant vector database for RAG retrieval by Aurora.
Quick Start
# Activate
cd /opt/recon && source venv/bin/activate
# Scan library for new PDFs
recon scan
# Queue and process
recon queue
recon extract
recon enrich
recon embed
# Or run full pipeline
recon run
# Ingest a web page
recon ingest-url "https://example.com/article" --category "Category" --process
# Crawl an entire docs site
recon crawl "https://docs.example.com" --include /docs/ --category "Category" --process
# Upload a PDF
recon upload --file /path/to/document.pdf --category "Category"
# Search
recon search "water purification methods"
# Check status
recon status
recon failures
Dashboard
Services
| Service | Location | Purpose |
|---|---|---|
| RECON Dashboard | recon:8420 | Pipeline management + API |
| Qdrant | cortex:6333 | Vector database |
| TEI | cortex:8090 | Embeddings (1,711/sec) |
| Ollama | cortex:11434 | Chat + fallback embeddings |
| OpenWebUI | cortex:8080 (ai.echo6.co) | Aurora chat with RAG |
| File Server | recon:8888 (files.echo6.co) | PDF downloads |
Key Paths
| Path | Contents |
|---|---|
| /opt/recon/ | Application code |
| /opt/recon/data/concepts/ | Gemini extractions (CRITICAL -- back these up) |
| /opt/recon/data/text/ | Extracted text |
| /opt/recon/data/recon.db | SQLite status DB |
| /mnt/library/ | PDF library (NFS from pi-nas) |
Backups
Automated every 6 hours to Contabo VPS via /opt/recon/scripts/backup.sh.
Concept JSONs are the most valuable data ($130+ of Gemini API work).
Qdrant is NOT backed up -- rebuilt from JSONs in ~10 minutes via recon rebuild.
Monitoring
# Pipeline status
recon status
# Tail logs
tail -f /opt/recon/logs/recon.log
# Pipeline run log
tail -f /opt/recon/pipeline.log
# Validate consistency
recon validate --deep
Full Documentation
See PROJECT-BIBLE.md for complete system documentation.