Commit graph

21 commits

Author SHA1 Message Date
3280e34718 Add Nav-I dashboard section with restore-as conflict resolution
- Create Nav-I top-level section in dashboard navigation
- Move Deleted Contacts from Knowledge subnav to Nav-I
- Add Nav-I landing page with card grid (deleted count, API keys stub)
- Add /nav-i/api-keys placeholder page
- Add restore-as endpoint for Home/Work conflict resolution
- Conflict modal in deleted contacts template for label rename on restore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-22 06:26:25 +00:00
a4288c0cd8 Add contacts/phone book system with per-user scoping
New files:
- lib/auth.py: Authentik forward-auth helpers (get_user_id, @require_auth)
- lib/contacts.py: ContactsDB with CRUD, soft delete, restore, purge, find_nearby
- lib/contacts_api.py: Flask Blueprint with 9 API endpoints at /api/contacts
- templates/knowledge/deleted_contacts.html: Dashboard recovery page

Modified:
- lib/api.py: Register contacts_bp, add KNOWLEDGE_SUBNAV entry, /deleted-contacts route
- config/profiles: has_contacts feature flag (true for home, false for pi profiles)

Separate SQLite DB at data/contacts.db. Per-user isolation via X-Authentik-Username.
Home/Work labels enforced unique per user. Haversine proximity queries (75m default).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-22 05:29:54 +00:00
2121ee4936 Add place detail proxy with Nominatim-first routing and Overpass fallback
New /api/place/<osm_type>/<osm_id> endpoint returns cleaned OSM tag data
for PlaceDetail panel enrichment. Routes to local Nominatim (Idaho coverage)
first, falls back to Overpass public API for out-of-region queries. Responses
cached in SQLite (data/place_cache.db) with no expiry.

New modules: lib/place_detail.py (proxy + cache), lib/osm_categories.py
(~50 category humanization mappings). Profile YAMLs updated with
place_details config block and has_nominatim_details flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 03:06:51 +00:00
64605b38bb Add TomTom traffic proxy and update profiles for hillshade/traffic layers
- Add /api/traffic/flow proxy route to hide TomTom API key from frontend
- Add tileset_hillshade and traffic config blocks to all three profiles
- Flip has_hillshade and has_traffic_overlay flags in home and regional profiles
- Minimal profile has config blocks but flags remain false (dormant)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 00:52:04 +00:00
e6b81db520 feat(navi): deployment profiles + /api/config endpoint
Add profile-driven config infrastructure:
- config/profiles/{home,regional_pi,minimal_pi}.yaml templates
- lib/deployment_config.py loader (reads RECON_PROFILE env var)
- GET /api/config returns active profile as JSON (5min cache)

Frontend reads this on startup to determine tile source, defaults,
and feature flags. No existing behavior changed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-20 23:35:39 +00:00
d4c5c371ca Merge feature/navi-integration: Navi backend (address book, Netsyms, geocoding chain, reverse endpoint) 2026-04-20 22:40:03 +00:00
dfab388769 feat(navi): add netsyms tier-2 geocoding + geocode API
Add Netsyms AddressDatabase2025 (159M US+CA addresses) as tier-2
in the geocode chain: address_book → netsyms → photon.

- lib/netsyms.py: SQLite lookup module (lazy, read-only, thread-safe)
- lib/netsyms_api.py: Flask blueprints for /api/netsyms/* and /api/geocode
- lib/netsyms_test.py: 7 test cases (street, free-text, zipcode, health)
- lib/nav_tools.py: new geocode() with consistent {name,lat,lon,source,raw}
- lib/api.py: register netsyms_bp and geocode_bp

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-20 07:24:09 +00:00
23483e8198 feat(navi): address book with geocoding integration
- YAML-backed saved locations (config/address_book.yaml)
- Exact/partial alias matching with case-insensitive lookup
- Flask blueprint: /api/address_book/lookup, /api/address_book/list
- Geocoder short-circuits Photon when address book has exact match
- Test suite for lookup behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-20 04:02:11 +00:00
8945c82e3f Replace wget/SingleFile/Playwright backends with Zimit
- Zimit Docker container handles all site types (static, SPA, JS redirects)
- Removed: _detect_crawl_mode, _crawl_wget, _crawl_singlefile, preflight logic
- Added: _crawl_zimit() with Docker lifecycle management
- Simplified pipeline: submit → Zimit crawl → kiwix-manage register → done
- No more zimwriterfs step — Zimit produces ZIM directly
- Dashboard UI simplified: removed crawl mode dropdown
- Config simplified: removed reject patterns, preflight, singlefile sections

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-19 14:06:23 +00:00
f0b160ef7c Extract _full_zim_cleanup helper, add SIGHUP + scrape_jobs cleanup
- Extract shared _full_zim_cleanup(source_id) from api_kiwix_remove
- Add SIGHUP to kiwix-serve after kiwix-manage remove
- Delete linked scrape_jobs rows during ZIM removal
- Update api_scraper_delete to do full ZIM cleanup when applicable
- Set chromium_path for single-file browser crawl support
- Add status.db to .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-19 02:28:49 +00:00
45c3bb8d56 Add scraper job queue management (delete, clear failed)
New API endpoints: DELETE single job, clear all failed/cancelled.
Dashboard now shows Delete buttons on completed/failed jobs,
Retry+Delete on failed jobs, and a Clear Failed bulk action.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-18 21:03:39 +00:00
1ce9a3731f Add scraper dashboard UI under Kiwix tab
New /kiwix/scraper page with submit form (URL, title, language,
crawl mode), stats cards, and auto-refreshing jobs table with
cancel/retry actions. Kiwix section now has Library/Scraper subnav.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-18 20:47:17 +00:00
da50e5f0b8 Add scraper Phase 2: smart crawl mode detection + browser fallback
- Pre-flight detection: wget + Playwright probe to auto-detect if site
  needs browser rendering (JS apps, parking page redirects)
- SingleFile CLI crawl backend for JS-rendered sites
- crawl_mode column in scrape_jobs (static/browser/redirect/auto)
- API: optional crawl_mode param on submit, cleared on retry
- Config: rate_limit_delay 2.0→0.5, /api/ reject pattern, preflight
  + singlefile config sections
- Prerequisites: Node.js 22, single-file-cli, Playwright + Chromium

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-18 18:26:43 +00:00
fed02186fa Fix Kiwix status badges to reflect full pipeline state
Status was showing COMPLETE after ZIM extraction finished, even when
documents were still queued for enrichment/embedding. Now computes
effective_status by checking actual pipeline state per-source:

- DETECTED: ingest not enabled (gray)
- EXTRACTING: ZIM processor running (blue)
- PROCESSING: extracted but docs still in enricher/embedder queue (amber)
- COMPLETE: all docs fully enriched and embedded in Qdrant (green)

Also fixed _build_kiwix_sources pipeline query to filter by category
per-source instead of returning global kiwix stats for every source.

Progress column now shows "X / Y in Qdrant" when processing, or
"X / Y extracted" otherwise.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-17 15:22:44 +00:00
6f2a1d206e Add langdetect language filter to enricher + purge non-English ZIM articles
- Install langdetect package for content-level language detection
- Add _check_language() to enricher.py: reads first 1500 chars of first
  page, detects language via langdetect, skips if not in allowed list
- Configurable via config.yaml pipeline.language_filter and
  pipeline.allowed_languages (default: en only)
- Catches non-English content from ANY source (PDF, web, ZIM, PeerTube)
  before burning Gemini API quota on enrichment
- Add scan_zims retry logic (3 attempts, 2s delay) for upload handler
- Purged 6,483 stale non-English zim_articles rows from DB

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-17 14:37:13 +00:00
2635160887 Kiwix integration: ZIM processor, dashboard tab, wiki.echo6.co citations
- ZIM processor: extract articles from ZIM files, feed into existing enrichment pipeline
- Dashboard: Kiwix tab with library table, ingest toggle, upload, remove
- kiwix-serve on port 8430, wiki.echo6.co behind Authentik
- Citation URLs point to wiki.echo6.co/{zimname}/{article_path}
- Dashboard shows WIKI type badge for ZIM-sourced content
- Appropedia EN (19,445 articles) fully ingested as proof of concept
2026-04-17 07:00:24 +00:00
e6224cb279 Migrate dashboard upload to pipeline with multi-format support
Upload handler now writes files to the appropriate hopper subfolder
instead of copying directly to /mnt/library/:
- .pdf -> acquired/pdf/
- .txt -> acquired/text/
- .epub, .doc, .docx, .mobi -> acquired/pdf/ (dispatcher format
  normalizer converts to PDF before processing)

The dispatcher picks up files and routes through the appropriate
processor (pdf_processor or text_processor) for full metadata
voting, domain classification, and canonical filing.

Changes to api_upload() / _process_upload():
- Relaxed extension check: PDF, TXT, EPUB, DOC, DOCX, MOBI
- Routes to correct hopper subfolder by extension
- Writes meta.json sidecar with original filename and category hint
- Removed: direct library copy, add_to_catalogue, queue_document
- Added: hopper-level dedup check (catches rapid re-uploads)
- Kept: catalogue dedup check for immediate user feedback

Changes to api_upload_status():
- Added fallback: checks acquired/ and processing/ dirs if hash
  not yet in documents table (covers gap between upload and
  dispatcher pickup)

Template updated: accept attribute and help text now reflect
multi-format support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 02:18:45 +00:00
7fe7d03583 Revert "Phase 6e: rewire dashboard PeerTube endpoint to acquisition module"
This reverts commit 7e42528d2f.
2026-04-15 03:20:46 +00:00
7e42528d2f Phase 6e: rewire dashboard PeerTube endpoint to acquisition module
Replace legacy ingest_channel/ingest_all imports with acquire_batch
from lib.acquisition.peertube. The endpoint now writes flat file pairs
to the hopper and lets the dispatcher handle processing, matching the
Phase 6d architecture. Removes channel/since/process parameters that
were tied to the old direct-ingest path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 03:15:41 +00:00
70b80cb312 Phase 6b: fix dashboard Untitled/WEB bug for transcripts
Two bugs in the Recently Completed table:

1. Title showed "Untitled" for all transcripts because the dashboard
   read documents.book_title (populated by PDF metadata voting) which
   is NULL for transcripts. Fixed by COALESCE(book_title, filename)
   in the SQL query -- falls back to catalogue.filename which holds
   the real video title.

2. Type showed "WEB" for all transcripts because the type CASE
   expression only had web and pdf branches, with web matching any
   http% path -- and transcript paths are PeerTube watch URLs.
   Fixed by adding a transcript branch keyed on catalogue.source =
   stream.echo6.co, evaluated before the web branch.

Also adds badge-transcript CSS (purple) and JS rendering case.
Applied consistently to both the Recently Completed and Sources
table queries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 23:05:29 +00:00
563c16bb71 Initial commit: RECON codebase baseline
Current state of the pipeline code as of 2026-04-14 (Phase 1 scaffolding complete).
Config has new_pipeline.enabled=false and crawler.sites=[] per refactor plan.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-14 14:57:23 +00:00