matt/recon

mirror of https://github.com/zvx-echo6/recon.git synced 2026-05-20 06:34:40 +02:00

No description

Python 86.8%
HTML 6.1%
JavaScript 5.4%
CSS 1%
Shell 0.7%

Find a file

Matt 70b80cb312 Phase 6b: fix dashboard Untitled/WEB bug for transcripts Two bugs in the Recently Completed table: 1. Title showed "Untitled" for all transcripts because the dashboard read documents.book_title (populated by PDF metadata voting) which is NULL for transcripts. Fixed by COALESCE(book_title, filename) in the SQL query -- falls back to catalogue.filename which holds the real video title. 2. Type showed "WEB" for all transcripts because the type CASE expression only had web and pdf branches, with web matching any http% path -- and transcript paths are PeerTube watch URLs. Fixed by adding a transcript branch keyed on catalogue.source = stream.echo6.co, evaluated before the web branch. Also adds badge-transcript CSS (purple) and JS rendering case. Applied consistently to both the Recently Completed and Sources table queries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-04-14 23:05:29 +00:00
lib	Phase 6b: fix dashboard Untitled/WEB bug for transcripts	2026-04-14 23:05:29 +00:00
scripts	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
static	Phase 6b: fix dashboard Untitled/WEB bug for transcripts	2026-04-14 23:05:29 +00:00
templates	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
.gitignore	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
api.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
config.yaml	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
enricher.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
migrate_paths.py	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
PROJECT-BIBLE.md	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
README.md	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
recon.py	Phase 5c-1: dispatcher loop, filing worker loop, service rewire	2026-04-14 18:30:58 +00:00
requirements.txt	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
run-pipeline-now.sh	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00
sweep_gated.sh	Initial commit: RECON codebase baseline	2026-04-14 14:57:23 +00:00

README.md

RECON -- Knowledge Extraction Pipeline

Extracts structured knowledge from PDFs and web content into a Qdrant vector database for RAG retrieval by Aurora.

Quick Start

# Activate
cd /opt/recon && source venv/bin/activate

# Scan library for new PDFs
recon scan

# Queue and process
recon queue
recon extract
recon enrich
recon embed

# Or run full pipeline
recon run

# Ingest a web page
recon ingest-url "https://example.com/article" --category "Category" --process

# Crawl an entire docs site
recon crawl "https://docs.example.com" --include /docs/ --category "Category" --process

# Upload a PDF
recon upload --file /path/to/document.pdf --category "Category"

# Search
recon search "water purification methods"

# Check status
recon status
recon failures

Dashboard

http://100.64.0.24:8420

Services

Service	Location	Purpose
RECON Dashboard	recon:8420	Pipeline management + API
Qdrant	cortex:6333	Vector database
TEI	cortex:8090	Embeddings (1,711/sec)
Ollama	cortex:11434	Chat + fallback embeddings
OpenWebUI	cortex:8080 (ai.echo6.co)	Aurora chat with RAG
File Server	recon:8888 (files.echo6.co)	PDF downloads

Key Paths

Path	Contents
/opt/recon/	Application code
/opt/recon/data/concepts/	Gemini extractions (CRITICAL -- back these up)
/opt/recon/data/text/	Extracted text
/opt/recon/data/recon.db	SQLite status DB
/mnt/library/	PDF library (NFS from pi-nas)

Backups

Automated every 6 hours to Contabo VPS via /opt/recon/scripts/backup.sh. Concept JSONs are the most valuable data ($130+ of Gemini API work). Qdrant is NOT backed up -- rebuilt from JSONs in ~10 minutes via recon rebuild.

Monitoring

# Pipeline status
recon status

# Tail logs
tail -f /opt/recon/logs/recon.log

# Pipeline run log
tail -f /opt/recon/pipeline.log

# Validate consistency
recon validate --deep

Full Documentation

See PROJECT-BIBLE.md for complete system documentation.