recon

matt/recon

mirror of https://github.com/zvx-echo6/recon.git synced 2026-06-10 08:54:34 +02:00

Author	SHA1	Message	Date
malice	14ad2cd34a	recon: add /api/wiki-rewrite endpoint (extraction #5 prep, additive) (#9 ) Per-tag HTTP wrapper over wiki_rewrite.rewrite_wiki_link so the (future) navi-places service can rewrite OSM wiki tags to local Kiwix URLs over HTTP instead of importing recon's wiki_rewrite module (which talks to Kiwix on localhost:8430 and the wiki_cache table in /opt/recon/data/place_cache.db). Companion to PR #8 (/api/wiki-enrich) — Matt picked option B (HTTP-couple the Kiwix offline-wiki rewriting too, since it matters in prod). GET /api/wiki-rewrite?tag=<wikipedia\|wikidata\|wikivoyage\|appropedia>&value=<raw> -> 200 {url, status} where status is "local" \| "public" \| "original" -> 400 on missing value or unknown tag -> no 404 (unclassifiable value echoes back with status "original", mirroring rewrite_wiki_link) Public (no auth), like /api/place/* and /api/wiki-enrich. Changes (additive only): - lib/wiki_rewrite_api.py: new wiki_rewrite_bp blueprint. Thin route directly over the existing rewrite_wiki_link(tag, value) — no extraction needed (it's already a clean standalone function, unlike wiki-enrich's lookup). - lib/api.py: register the blueprint (one block). - lib/wiki_rewrite_api_test.py: 5 tests (local Kiwix hit, public fallback, unclassifiable -> original, missing value -> 400, unknown tag -> 400), stubbing check_kiwix_has_article (no Kiwix/DB), plain-assert + __main__ runner. Verified green against recon's venv (flask 3.1.2). Does NOT touch place_detail's in-process _enrich_wiki_links — that gets removed in a later PR once navi-places is live (same as PR #8). wiki_cache stays in recon's own place_cache.db post-cutover (harmless positive-cache duplication). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 14:08:18 -06:00
malice	f42b1fef3b	recon: add /api/wiki-enrich endpoint (extraction #5 prep, additive) (#8 ) HTTP wrapper over the wiki_index lookup so the (future) navi-places service can fetch wiki enrichment over HTTP instead of reading recon's 2.1 GB data/wiki_index.db directly (Phase A option B — HTTP coupling). GET /api/wiki-enrich?wikidata=<Qid> (primary key) GET /api/wiki-enrich?name=<name>&country=<cc> (fallback key) -> 200 {wiki_summary?, wiki_population?, wiki_url?, wikivoyage_url?} -> 400 if no usable key; 404 on no match. Public (no auth, like /api/place/*). Route keys are wikidata_id / name+country — NOT osm_type/osm_id — because that is how wiki_index is actually queried (the in-process _enrich_with_wiki_index looks up by result['wikidata_id'] then name+country_code, never by OSM id; see extraction-5-wiki-enrich-investigation.md). An osm-keyed route would have forced a redundant in-recon place lookup. Changes (additive only): - lib/place_detail.py: new standalone lookup_wiki_index(wikidata_id, name, country_code) doing the same two SELECTs + field/URL mapping as the in-process path, returning a dict or None. Pure DB read, never raises. `_enrich_with_wiki_index` is LEFT UNTOUCHED — it can be DRY-refactored to delegate to this in a later PR; the in-process enrichment path is unchanged. - lib/wiki_enrich_api.py: new wiki_enrich_bp blueprint with the route. - lib/api.py: register the blueprint (one block). - lib/wiki_enrich_api_test.py: 4 tests (hit-by-wikidata + decoded fields, no-match -> 404, name+country fallback, no-key -> 400) over an in-memory fixture DB; plain-assert style + __main__ runner (recon venv has no pytest). Verified green against recon's venv (flask 3.1.2). Does NOT remove the in-process _enrich_with_wiki_index call from place_detail — that happens in a later PR once navi-places is live and serving. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 13:23:08 -06:00
malice	75664c7d02	recon: remove /api/traffic/flow handler (now served by navi-traffic, extraction #1 ) The /api/traffic/flow/<z>/<x>/<y>.png handler is dead code in recon. As of extraction #1 of the recon<->Navi decoupling, this path is served by the standalone navi-traffic service. Live request flow is now: Caddy (CT 101, navi.echo6.co @authed_api, forward_auth) -> nginx :8440 (location ^~ /api/traffic/ -> proxy_cache traffic_cache) -> navi-traffic gunicorn :8421 (services/navi_traffic) Cutover verified live: authenticated browser fetch to https://navi.echo6.co/api/traffic/flow/... returns 200 image/png with X-Cache-Status MISS then HIT (120s cache), Server: gunicorn. navi-backend (github.com/zvx-echo6/navi-backend): - dae54f3 Initial scaffold: navi-backend + navi-traffic - 311cb8f nginx: use ^~ prefix on /api/traffic/ to beat .png regex catch-all Caddy cutover (@authed_api upstream 8420 -> nginx 8440) applied on Utility CT 101. Also drops the now-unused make_response flask import (no other uses in lib/api.py). os and http_requests remain (used elsewhere). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 01:12:20 -06:00
malice	dcd4ddd358	Migrate TomTom flow proxy from classic to Orbis Maps API	2026-05-21 16:07:54 -06:00
Matt	686b35710a	api: add auto mode to offroute endpoint validation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-05-08 22:37:49 +00:00
Matt	2252905986	feat(offroute): MVUM legal access — pathfinder integration + places panel API + boundary_mode control MVUM Data Import: - Downloaded USFS MVUM Roads (150,636 features) and Trails (28,741 features) - Imported to navi.db as mvum_roads and mvum_trails tables - Idaho coverage: ~8,994 roads and ~4,504 trails across 7 national forests - Preserved all vehicle-class fields (ATV, MOTORCYCLE, HIGHCLEARANCEVEHICLE, etc.) - Preserved seasonal date ranges (_DATESOPEN fields) New mvum.py module: - MVUMReader class for querying MVUM data by bbox and nearest point - parse_date_range() for seasonal date string parsing (MM/DD-MM/DD format) - check_access() for determining open/closed status with date checking - symbol_to_access() fallback when per-vehicle fields are null - get_mvum_access_grid() for rasterizing MVUM to pathfinder grid Cost function integration: - Added mvum parameter to compute_cost_grid() - MVUM closures respond to boundary_mode: strict = impassable (np.inf) * pragmatic = 5x friction penalty * emergency = ignored entirely - Foot mode skips MVUM (motor-vehicle specific) Router integration: - Loads MVUM access grid for motorized modes (mtb, atv, vehicle) - Tracks mvum_closed_crossings in path summary Places Panel API: - GET /api/mvum?lat=XX&lon=XX&radius=50 - Returns MVUM feature with access status for all vehicle classes - Includes seasonal date ranges, maintenance level, forest/district info - GeoJSON geometry for map display Validation: - MVUM places endpoint tested with Sawtooth NF road - All four modes validated with strict/pragmatic/emergency boundary modes - Foot mode correctly ignores MVUM restrictions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-05-08 14:26:18 +00:00
Matt	bc463188d5	feat(offroute): Phase O4 — multi-mode cost functions (foot/mtb/atv/vehicle) - Add ModeProfile dataclass for data-driven mode configuration - Implement three speed functions: * Tobler off-path hiking (foot) * Herzog wheeled-transport polynomial (mtb/atv) * Linear speed degradation (vehicle) - Add WildernessReader for PAD-US Des_Tp=WA wilderness areas - Mode-specific terrain friction overrides: * Forest impassable for ATV/vehicle, high friction for MTB * Wetland/mangrove impassable for all wheeled modes - Trail access rules: * Foot trails (value 25) impassable for ATV/vehicle - Wilderness blocking for mtb/atv/vehicle modes - Vehicle mode allows flat grassland/cropland traversal - Memory optimization: limit entry points, constrain bbox size - Update router to pass mode and wilderness to cost function - Add vehicle to API mode validation Validated all four modes with test route: - foot: 0.46km off-network, 12.11km network, 89% on trail - mtb: 0.47km off-network, 13.13km network, 90% on trail - atv: 0.47km off-network, 12.81km network, 90% on trail - vehicle: 0.46km off-network, 12.81km network, 89% on trail Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-05-08 14:11:56 +00:00
Matt	1a9dfc8f8d	feat(offroute): Phase O3b — trail entry index, Valhalla stitching, /api/offroute endpoint Phase A: Trail Entry Point Index - Extract highway endpoints from idaho-latest.osm.pbf using osmium + ogr2ogr - Store 740,430 entry points in /mnt/nav/navi.db (SQLite with spatial index) - Entry points by class: service (271k), footway (152k), residential (146k), track (111k), path (26k), unclassified (16k), tertiary (9k), secondary (4k), primary (4k), bridleway (15) Phase B: Pathfinder → Valhalla Stitching (router.py) - OffrouteRouter orchestrates wilderness pathfinding + Valhalla on-network routing - Queries entry points within 50km (expanding to 100km if needed) - MCP pathfinder routes to nearest reachable entry point - Calls Valhalla pedestrian/bicycle/auto costing for on-network segment - Returns GeoJSON FeatureCollection with wilderness + network + combined segments Phase C: Flask Endpoint - POST /api/offroute with start/end coordinates, mode, boundary_mode - Returns GeoJSON route with per-segment metadata and turn-by-turn maneuvers Validated: 42.35,-114.30 → Twin Falls downtown - Wilderness: 0.5km, 9min \| Network: 36km, 413min \| Total: ~421min - 21 turn-by-turn instructions, segments connect at entry point Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-05-08 13:44:34 +00:00
Matt	121eb45b44	feat: add /api/auth/whoami endpoint for frontend auth state Returns {authenticated: bool, username: string\|null} based on X-Authentik-Username header presence. Used by Navi frontend to detect auth state without triggering SSO redirect.	2026-04-27 01:26:44 +00:00
Matt	e9c9cee4f3	feat: Add wikidata lookup endpoint for place enrichment - Add get_place_by_wikidata() to place_detail.py - Queries Wikidata API for entity details (name, description, coords) - Extracts population, instance_of, OSM relation ID, Wikipedia link - Add /api/place/wikidata/<id> route to api.py Supports Navi basemap label enrichment when OSM details unavailable. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-04-26 08:15:16 +00:00
Matt	15c58a69ac	Add Nav-I API key management UI Replace /nav-i/api-keys stub with functional admin page for managing third-party API keys (Gemini, TomTom, Google Places). - New lib/api_keys_admin.py: list/update/test operations with masked display, atomic .env writes (.env.bak backup), provider-specific test calls (Gemini models.list, TomTom geocode, Google Places searchText) - 4 new endpoints: GET /api/nav-i/api-keys/list, POST .../update, POST .../test, POST .../restart-recon - Full UI: key table with masked values, per-key update modal with show/hide toggle, inline test results with latency, Gemini detail sub-table with per-key stats, RECON restart with confirmation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-23 06:50:44 +00:00
Matt	9c5b0520f9	Add PAD-US public land classification lookup Integrates USGS PAD-US 4.0 (651k features) into a local PostGIS database for point-in-polygon land ownership queries. Adds /api/landclass endpoint returning classifications, public/private status, and management hierarchy. - lib/landclass.py: connection pool, lookup_landclass(), domain label maps - lib/api.py: GET /api/landclass?lat=&lon= (feature-flag gated) - home.yaml: enable has_landclass flag Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-22 15:36:37 +00:00
Matt	3280e34718	Add Nav-I dashboard section with restore-as conflict resolution - Create Nav-I top-level section in dashboard navigation - Move Deleted Contacts from Knowledge subnav to Nav-I - Add Nav-I landing page with card grid (deleted count, API keys stub) - Add /nav-i/api-keys placeholder page - Add restore-as endpoint for Home/Work conflict resolution - Conflict modal in deleted contacts template for label rename on restore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-22 06:26:25 +00:00
Matt	a4288c0cd8	Add contacts/phone book system with per-user scoping New files: - lib/auth.py: Authentik forward-auth helpers (get_user_id, @require_auth) - lib/contacts.py: ContactsDB with CRUD, soft delete, restore, purge, find_nearby - lib/contacts_api.py: Flask Blueprint with 9 API endpoints at /api/contacts - templates/knowledge/deleted_contacts.html: Dashboard recovery page Modified: - lib/api.py: Register contacts_bp, add KNOWLEDGE_SUBNAV entry, /deleted-contacts route - config/profiles: has_contacts feature flag (true for home, false for pi profiles) Separate SQLite DB at data/contacts.db. Per-user isolation via X-Authentik-Username. Home/Work labels enforced unique per user. Haversine proximity queries (75m default). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-22 05:29:54 +00:00
Matt	2121ee4936	Add place detail proxy with Nominatim-first routing and Overpass fallback New /api/place/<osm_type>/<osm_id> endpoint returns cleaned OSM tag data for PlaceDetail panel enrichment. Routes to local Nominatim (Idaho coverage) first, falls back to Overpass public API for out-of-region queries. Responses cached in SQLite (data/place_cache.db) with no expiry. New modules: lib/place_detail.py (proxy + cache), lib/osm_categories.py (~50 category humanization mappings). Profile YAMLs updated with place_details config block and has_nominatim_details flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-21 03:06:51 +00:00
Matt	64605b38bb	Add TomTom traffic proxy and update profiles for hillshade/traffic layers - Add /api/traffic/flow proxy route to hide TomTom API key from frontend - Add tileset_hillshade and traffic config blocks to all three profiles - Flip has_hillshade and has_traffic_overlay flags in home and regional profiles - Minimal profile has config blocks but flags remain false (dormant) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-21 00:52:04 +00:00
Matt	e6b81db520	feat(navi): deployment profiles + /api/config endpoint Add profile-driven config infrastructure: - config/profiles/{home,regional_pi,minimal_pi}.yaml templates - lib/deployment_config.py loader (reads RECON_PROFILE env var) - GET /api/config returns active profile as JSON (5min cache) Frontend reads this on startup to determine tile source, defaults, and feature flags. No existing behavior changed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-20 23:35:39 +00:00
Matt	d4c5c371ca	Merge feature/navi-integration: Navi backend (address book, Netsyms, geocoding chain, reverse endpoint)	2026-04-20 22:40:03 +00:00
Matt	dfab388769	feat(navi): add netsyms tier-2 geocoding + geocode API Add Netsyms AddressDatabase2025 (159M US+CA addresses) as tier-2 in the geocode chain: address_book → netsyms → photon. - lib/netsyms.py: SQLite lookup module (lazy, read-only, thread-safe) - lib/netsyms_api.py: Flask blueprints for /api/netsyms/* and /api/geocode - lib/netsyms_test.py: 7 test cases (street, free-text, zipcode, health) - lib/nav_tools.py: new geocode() with consistent {name,lat,lon,source,raw} - lib/api.py: register netsyms_bp and geocode_bp Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-20 07:24:09 +00:00
Matt	23483e8198	feat(navi): address book with geocoding integration - YAML-backed saved locations (config/address_book.yaml) - Exact/partial alias matching with case-insensitive lookup - Flask blueprint: /api/address_book/lookup, /api/address_book/list - Geocoder short-circuits Photon when address book has exact match - Test suite for lookup behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-20 04:02:11 +00:00
Matt	8945c82e3f	Replace wget/SingleFile/Playwright backends with Zimit - Zimit Docker container handles all site types (static, SPA, JS redirects) - Removed: _detect_crawl_mode, _crawl_wget, _crawl_singlefile, preflight logic - Added: _crawl_zimit() with Docker lifecycle management - Simplified pipeline: submit → Zimit crawl → kiwix-manage register → done - No more zimwriterfs step — Zimit produces ZIM directly - Dashboard UI simplified: removed crawl mode dropdown - Config simplified: removed reject patterns, preflight, singlefile sections Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-19 14:06:23 +00:00
Matt	f0b160ef7c	Extract _full_zim_cleanup helper, add SIGHUP + scrape_jobs cleanup - Extract shared _full_zim_cleanup(source_id) from api_kiwix_remove - Add SIGHUP to kiwix-serve after kiwix-manage remove - Delete linked scrape_jobs rows during ZIM removal - Update api_scraper_delete to do full ZIM cleanup when applicable - Set chromium_path for single-file browser crawl support - Add status.db to .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-19 02:28:49 +00:00
Matt	45c3bb8d56	Add scraper job queue management (delete, clear failed) New API endpoints: DELETE single job, clear all failed/cancelled. Dashboard now shows Delete buttons on completed/failed jobs, Retry+Delete on failed jobs, and a Clear Failed bulk action. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-18 21:03:39 +00:00
Matt	1ce9a3731f	Add scraper dashboard UI under Kiwix tab New /kiwix/scraper page with submit form (URL, title, language, crawl mode), stats cards, and auto-refreshing jobs table with cancel/retry actions. Kiwix section now has Library/Scraper subnav. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-18 20:47:17 +00:00
Matt	da50e5f0b8	Add scraper Phase 2: smart crawl mode detection + browser fallback - Pre-flight detection: wget + Playwright probe to auto-detect if site needs browser rendering (JS apps, parking page redirects) - SingleFile CLI crawl backend for JS-rendered sites - crawl_mode column in scrape_jobs (static/browser/redirect/auto) - API: optional crawl_mode param on submit, cleared on retry - Config: rate_limit_delay 2.0→0.5, /api/ reject pattern, preflight + singlefile config sections - Prerequisites: Node.js 22, single-file-cli, Playwright + Chromium Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-18 18:26:43 +00:00
Matt	fed02186fa	Fix Kiwix status badges to reflect full pipeline state Status was showing COMPLETE after ZIM extraction finished, even when documents were still queued for enrichment/embedding. Now computes effective_status by checking actual pipeline state per-source: - DETECTED: ingest not enabled (gray) - EXTRACTING: ZIM processor running (blue) - PROCESSING: extracted but docs still in enricher/embedder queue (amber) - COMPLETE: all docs fully enriched and embedded in Qdrant (green) Also fixed _build_kiwix_sources pipeline query to filter by category per-source instead of returning global kiwix stats for every source. Progress column now shows "X / Y in Qdrant" when processing, or "X / Y extracted" otherwise. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-17 15:22:44 +00:00
Matt	6f2a1d206e	Add langdetect language filter to enricher + purge non-English ZIM articles - Install langdetect package for content-level language detection - Add _check_language() to enricher.py: reads first 1500 chars of first page, detects language via langdetect, skips if not in allowed list - Configurable via config.yaml pipeline.language_filter and pipeline.allowed_languages (default: en only) - Catches non-English content from ANY source (PDF, web, ZIM, PeerTube) before burning Gemini API quota on enrichment - Add scan_zims retry logic (3 attempts, 2s delay) for upload handler - Purged 6,483 stale non-English zim_articles rows from DB Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-17 14:37:13 +00:00
Matt	2635160887	Kiwix integration: ZIM processor, dashboard tab, wiki.echo6.co citations - ZIM processor: extract articles from ZIM files, feed into existing enrichment pipeline - Dashboard: Kiwix tab with library table, ingest toggle, upload, remove - kiwix-serve on port 8430, wiki.echo6.co behind Authentik - Citation URLs point to wiki.echo6.co/{zimname}/{article_path} - Dashboard shows WIKI type badge for ZIM-sourced content - Appropedia EN (19,445 articles) fully ingested as proof of concept	2026-04-17 07:00:24 +00:00
Matt	e6224cb279	Migrate dashboard upload to pipeline with multi-format support Upload handler now writes files to the appropriate hopper subfolder instead of copying directly to /mnt/library/: - .pdf -> acquired/pdf/ - .txt -> acquired/text/ - .epub, .doc, .docx, .mobi -> acquired/pdf/ (dispatcher format normalizer converts to PDF before processing) The dispatcher picks up files and routes through the appropriate processor (pdf_processor or text_processor) for full metadata voting, domain classification, and canonical filing. Changes to api_upload() / _process_upload(): - Relaxed extension check: PDF, TXT, EPUB, DOC, DOCX, MOBI - Routes to correct hopper subfolder by extension - Writes meta.json sidecar with original filename and category hint - Removed: direct library copy, add_to_catalogue, queue_document - Added: hopper-level dedup check (catches rapid re-uploads) - Kept: catalogue dedup check for immediate user feedback Changes to api_upload_status(): - Added fallback: checks acquired/ and processing/ dirs if hash not yet in documents table (covers gap between upload and dispatcher pickup) Template updated: accept attribute and help text now reflect multi-format support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-16 02:18:45 +00:00
Matt	7fe7d03583	Revert "Phase 6e: rewire dashboard PeerTube endpoint to acquisition module" This reverts commit `7e42528d2f`.	2026-04-15 03:20:46 +00:00
Matt	7e42528d2f	Phase 6e: rewire dashboard PeerTube endpoint to acquisition module Replace legacy ingest_channel/ingest_all imports with acquire_batch from lib.acquisition.peertube. The endpoint now writes flat file pairs to the hopper and lets the dispatcher handle processing, matching the Phase 6d architecture. Removes channel/since/process parameters that were tied to the old direct-ingest path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-15 03:15:41 +00:00
Matt	70b80cb312	Phase 6b: fix dashboard Untitled/WEB bug for transcripts Two bugs in the Recently Completed table: 1. Title showed "Untitled" for all transcripts because the dashboard read documents.book_title (populated by PDF metadata voting) which is NULL for transcripts. Fixed by COALESCE(book_title, filename) in the SQL query -- falls back to catalogue.filename which holds the real video title. 2. Type showed "WEB" for all transcripts because the type CASE expression only had web and pdf branches, with web matching any http% path -- and transcript paths are PeerTube watch URLs. Fixed by adding a transcript branch keyed on catalogue.source = stream.echo6.co, evaluated before the web branch. Also adds badge-transcript CSS (purple) and JS rendering case. Applied consistently to both the Recently Completed and Sources table queries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-14 23:05:29 +00:00
Matt	563c16bb71	Initial commit: RECON codebase baseline Current state of the pipeline code as of 2026-04-14 (Phase 1 scaffolding complete). Config has new_pipeline.enabled=false and crawler.sites=[] per refactor plan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-14 14:57:23 +00:00

33 commits