The /api/geocode endpoint blended Photon and Netsyms results, but only
Photon respected viewport bias from prior work. Address queries to
Netsyms/AddressDB returned globally-sorted matches regardless of where
the user was looking — searching '214 North St' from Idaho returned
Illinois results.
Now fetches up to 200 Netsyms results when viewport lat/lon provided,
sorts by squared distance from viewport center, then returns top N.
Falls back to default ordering when viewport absent. Photon path
unchanged.
Request polygon_geojson=1 from Nominatim to include admin boundary
polygons in place detail responses. Also fetch boundary via OSM
relation ID for wikidata lookups.
- Add get_place_by_wikidata() to place_detail.py
- Queries Wikidata API for entity details (name, description, coords)
- Extracts population, instance_of, OSM relation ID, Wikipedia link
- Add /api/place/wikidata/<id> route to api.py
Supports Navi basemap label enrichment when OSM details unavailable.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add lat/lon/zoom params to geocode() and _retrieve_photon_freetext()
- Update nav_tools.py wrapper to pass through viewport params
- Add /api/geocode handler support for lat/lon/zoom query params
- Add _safe_float() helper for param validation
- Cast zoom to int for Photon compatibility
Allows the frontend to pass current map center/zoom to bias
search results toward the visible area.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Guest users receive local and cached data only. New Google Places API
calls are only triggered for authenticated users, protecting against
cost exploitation on the public navi.echo6.co frontend.
The pattern: cached Google data flows freely (already paid for by an
authed lookup). New API calls require X-Authentik-Username via
get_user_id() check.
Replace /nav-i/api-keys stub with functional admin page for managing
third-party API keys (Gemini, TomTom, Google Places).
- New lib/api_keys_admin.py: list/update/test operations with masked
display, atomic .env writes (.env.bak backup), provider-specific
test calls (Gemini models.list, TomTom geocode, Google Places
searchText)
- 4 new endpoints: GET /api/nav-i/api-keys/list, POST .../update,
POST .../test, POST .../restart-recon
- Full UI: key table with masked values, per-key update modal with
show/hide toggle, inline test results with latency, Gemini detail
sub-table with per-key stats, RECON restart with confirmation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrites OSM wikipedia/wikidata/wikivoyage/appropedia extratag values
to local Kiwix URLs (wiki.echo6.co) when the article exists in a loaded
ZIM, falling back silently to public URLs otherwise.
- New lib/wiki_rewrite.py: URL classification, Kiwix OPDS catalog
discovery (xml.etree.ElementTree), HEAD-based availability check,
positive-only SQLite cache, disabled discovery stubs
- place_detail.py: _enrich_wiki_links() at both Nominatim and Overpass
enrichment sites, before cache_put
- Profile flags: has_wiki_rewriting (home/regional: true, minimal: false),
has_wiki_discovery (all: false, stubs for future activation)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrates USGS PAD-US 4.0 (651k features) into a local PostGIS database
for point-in-polygon land ownership queries. Adds /api/landclass endpoint
returning classifications, public/private status, and management hierarchy.
- lib/landclass.py: connection pool, lookup_landclass(), domain label maps
- lib/api.py: GET /api/landclass?lat=&lon= (feature-flag gated)
- home.yaml: enable has_landclass flag
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fills opening_hours, phone, and website gaps when OSM + Overture data
is incomplete. Only fires for business-class POIs (amenity, shop, tourism,
leisure, office, craft). Daily API call cap with SQLite tracking.
cache_put now preserves google columns across cache refreshes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a query contains no road-type keywords (st, blvd, ave, etc.),
boost amenity/shop/tourism/leisure/office/craft results (+3.0) and
penalize highway/route results (-4.0). This fixes searches like
"starbucks twin falls" where a named service road outranked the
actual business POI due to Photon position tiebreaking.
Also fixes:
- Intent classifier now recognizes full state names ("idaho" not
just "ID") for LOCALITY classification
- Locality-type Photon results now populate _city from name field
so they participate in locality_fuzz scoring
- Trace logging expanded to all candidates with osm_key/value
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Businesses with housenumbers (e.g. M&W Markets at 130 US-30) were
classified as street_address because the housenumber check fired before
the osm_key check. Reorder so osm_key in amenity/shop/tourism/leisure/office
is evaluated first, ensuring businesses get type=poi regardless of
whether they have a street address. Also adds office to the POI key set.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ingests 20.9M North America places from Overture Maps Foundation
(release 2026-04-15.0) into PostgreSQL. Enriches /api/place responses
with phone, website, and brand data via spatial + fuzzy name matching
when OSM extratags are sparse.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New /api/place/<osm_type>/<osm_id> endpoint returns cleaned OSM tag data
for PlaceDetail panel enrichment. Routes to local Nominatim (Idaho coverage)
first, falls back to Overpass public API for out-of-region queries. Responses
cached in SQLite (data/place_cache.db) with no expiry.
New modules: lib/place_detail.py (proxy + cache), lib/osm_categories.py
(~50 category humanization mappings). Profile YAMLs updated with
place_details config block and has_nominatim_details flag.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add /api/traffic/flow proxy route to hide TomTom API key from frontend
- Add tileset_hillshade and traffic config blocks to all three profiles
- Flip has_hillshade and has_traffic_overlay flags in home and regional profiles
- Minimal profile has config blocks but flags remain false (dormant)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add profile-driven config infrastructure:
- config/profiles/{home,regional_pi,minimal_pi}.yaml templates
- lib/deployment_config.py loader (reads RECON_PROFILE env var)
- GET /api/config returns active profile as JSON (5min cache)
Frontend reads this on startup to determine tile source, defaults,
and feature flags. No existing behavior changed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Accepts lat/lon query params, calls Photon /reverse, returns same
response shape as /api/geocode. Returns 200 with empty results on
no match (graceful degradation for ocean/unmapped areas).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Inverts the /api/geocode chain. Photon is now the primary search
engine; the hand-rolled Netsyms free-text parser is removed.
Address book short-circuits nicknames only ("home", "work") —
full-address queries flow through Photon and address book
entries within 75m annotate matching results with labeled_as.
Coordinate strings detected before search.
Response shape: /api/geocode now returns a ranked candidates
list (always 200 OK, empty list if no match). No more 404 for
unmatched queries. Users can type messy input — wrong case,
missing punctuation, abbreviations, typos — and get results
or close matches.
Netsyms preserved at /api/netsyms/lookup for direct access.
USPS plus4 enrichment of Photon street-address hits is a
planned follow-up.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
lookup() previously did exact-alias-only matching, so "214 north st
filer" missed the home entry with alias "214 north st". Extend to
match when the query begins with an alias followed by a word
boundary, and when an alias appears as a contiguous token sequence
inside the query. Short aliases ("home") keep matching exactly and
also match with trailing text.
Fixes the UX case where typing a known full address falls through
to Netsyms instead of short-circuiting to address_book.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- YAML-backed saved locations (config/address_book.yaml)
- Exact/partial alias matching with case-insensitive lookup
- Flask blueprint: /api/address_book/lookup, /api/address_book/list
- Geocoder short-circuits Photon when address book has exact match
- Test suite for lookup behavior
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add centroid-based query classifier that routes Aurora queries to the
appropriate handler (nav_route, nav_reverse_geocode, direct_answer,
rag_search) before the RAG pipeline runs. Uses TEI embeddings against
pre-computed route centroids from 38 example queries.
- query_router.py: standalone module with lazy centroid init
- query_router_test.py: 7-query test suite (all passing)
- Corresponding recon_rag_tool.py v4.2.0 deployed to Open WebUI DB
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- nav_tools.py: route() geocodes via Photon, routes via Valhalla, returns
summary/maneuvers/polyline. reverse_geocode() for coordinate lookups.
Supports auto/pedestrian/bicycle/truck modes.
- nav_tools_test.py: 5 live tests against local Photon (2322) and Valhalla (8002)
- aurora_nav_tool.py: Open WebUI Tool exposing get_directions to Aurora LLM
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Parse Browsertrix "crawled":N JSON format instead of "N pages"
- Add 3s delay between SIGHUP to kiwix-serve and scan_zims() call
so the OPDS catalog is reloaded before we query it for linking
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Parse "crawled":N from Browsertrix crawlStatus JSON logs instead of
looking for "N pages" pattern. Also check stdout (not just stderr).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
warc2zim (called internally by zimit) requires --name for ZIM metadata.
Without it, argument validation fails with exit code 2.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Must pass `zimit` as command after image name (entrypoint execs args)
- --url → --seeds, --name removed, --lang → --zim-lang, --workers → -w
- Remove --rm so docker logs work after exit, manually rm container
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract shared _full_zim_cleanup(source_id) from api_kiwix_remove
- Add SIGHUP to kiwix-serve after kiwix-manage remove
- Delete linked scrape_jobs rows during ZIM removal
- Update api_scraper_delete to do full ZIM cleanup when applicable
- Set chromium_path for single-file browser crawl support
- Add status.db to .gitignore
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New API endpoints: DELETE single job, clear all failed/cancelled.
Dashboard now shows Delete buttons on completed/failed jobs,
Retry+Delete on failed jobs, and a Clear Failed bulk action.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New /kiwix/scraper page with submit form (URL, title, language,
crawl mode), stats cards, and auto-refreshing jobs table with
cancel/retry actions. Kiwix section now has Library/Scraper subnav.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Format: {domain}_{lang}_{YYYY-MM}_{job_id}.zim
Prevents zimwriterfs failures when the same domain is scraped
multiple times in the same month.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SingleFile CLI has no --crawl-delay option. The invalid flag caused the
process to print help and exit with no output. Added --crawl-no-parent
and --crawl-replace-URLs instead. Removed unused crawl_delay config key.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Status was showing COMPLETE after ZIM extraction finished, even when
documents were still queued for enrichment/embedding. Now computes
effective_status by checking actual pipeline state per-source:
- DETECTED: ingest not enabled (gray)
- EXTRACTING: ZIM processor running (blue)
- PROCESSING: extracted but docs still in enricher/embedder queue (amber)
- COMPLETE: all docs fully enriched and embedded in Qdrant (green)
Also fixed _build_kiwix_sources pipeline query to filter by category
per-source instead of returning global kiwix stats for every source.
Progress column now shows "X / Y in Qdrant" when processing, or
"X / Y extracted" otherwise.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Install langdetect package for content-level language detection
- Add _check_language() to enricher.py: reads first 1500 chars of first
page, detects language via langdetect, skips if not in allowed list
- Configurable via config.yaml pipeline.language_filter and
pipeline.allowed_languages (default: en only)
- Catches non-English content from ANY source (PDF, web, ZIM, PeerTube)
before burning Gemini API quota on enrichment
- Add scan_zims retry logic (3 attempts, 2s delay) for upload handler
- Purged 6,483 stale non-English zim_articles rows from DB
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Skip articles with MediaWiki translation suffixes (/es, /fr, /pl, etc.)
before text extraction to avoid wasting Gemini enrichment on translations.
Uses path-based regex matching against ISO 639 language codes.
~5,276 non-English articles already ingested from Appropedia (top: es=837,
zh=765, ru=475, fr=433, ko=407). Purge decision deferred.
- ZIM processor: extract articles from ZIM files, feed into existing enrichment pipeline
- Dashboard: Kiwix tab with library table, ingest toggle, upload, remove
- kiwix-serve on port 8430, wiki.echo6.co behind Authentik
- Citation URLs point to wiki.echo6.co/{zimname}/{article_path}
- Dashboard shows WIKI type badge for ZIM-sourced content
- Appropedia EN (19,445 articles) fully ingested as proof of concept
Adds lib/processors/zim_processor.py which opens a ZIM file via
python-libzim, iterates HTML articles, strips to clean text (lxml),
and feeds each article into the existing RECON enrichment pipeline.
Key features:
- HTML to text via lxml (strips nav/footer/script/style)
- Filters redirects, non-HTML entries, stubs (<200 chars)
- Content hash dedup against existing catalogue
- Creates processing dirs with page files and meta.json
- Registers articles as "extracted" for automatic enrichment
- Checkpointing via zim_sources.last_checkpoint for resume
- Configurable batch size and delay for rate control
- Standalone CLI: python3 -m lib.processors.zim_processor
Tested: 100 Appropedia articles processed in 3s, enricher picks
them up automatically via the existing pipeline.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add lib/zim_monitor.py: polls kiwix-serve OPDS v2 catalog, detects
new ZIMs, reads accurate article count from python-libzim Counter
metadata (not inflated OPDS count), inserts into zim_sources table.
Idempotent on re-run, marks removed ZIMs.
- DB schema: zim_sources, zim_samples, zim_articles tables (created
via sqlite3, not in migrations — matches existing RECON pattern)
- kiwix-tools 3.7.0 installed from binary tarball at /opt/recon/bin/
(Ubuntu 24.04 apt ships 3.5.0 which lacks OPDS v2)
- kiwix.service systemd unit on port 8430
- python-libzim 3.9.0 installed
- Test ZIM: Appropedia EN maxi (496 MB, 19,445 articles)
- Add bin/ to .gitignore (binary tarball, not source)
Upload handler now writes files to the appropriate hopper subfolder
instead of copying directly to /mnt/library/:
- .pdf -> acquired/pdf/
- .txt -> acquired/text/
- .epub, .doc, .docx, .mobi -> acquired/pdf/ (dispatcher format
normalizer converts to PDF before processing)
The dispatcher picks up files and routes through the appropriate
processor (pdf_processor or text_processor) for full metadata
voting, domain classification, and canonical filing.
Changes to api_upload() / _process_upload():
- Relaxed extension check: PDF, TXT, EPUB, DOC, DOCX, MOBI
- Routes to correct hopper subfolder by extension
- Writes meta.json sidecar with original filename and category hint
- Removed: direct library copy, add_to_catalogue, queue_document
- Added: hopper-level dedup check (catches rapid re-uploads)
- Kept: catalogue dedup check for immediate user feedback
Changes to api_upload_status():
- Added fallback: checks acquired/ and processing/ dirs if hash
not yet in documents table (covers gap between upload and
dispatcher pickup)
Template updated: accept attribute and help text now reflect
multi-format support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Same fix as text_processor — Gemini sometimes returns the literal
string "null" instead of JSON null for empty metadata fields. The
voting logic and Gemini extraction now both treat "null" strings
as None.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds _normalize_formats() to the dispatcher that converts non-standard
document formats to PDF before dispatch. Supports:
- .epub, .mobi -> PDF via ebook-convert (Calibre)
- .doc, .docx -> PDF via LibreOffice headless
Called per-subfolder before _find_pairs() so _find_pairs() only ever
sees standard content files. Conversion failures are logged and
skipped -- the original file stays in acquired/ for manual review.
Also converts 3 staged epub files and cleans up _staging/.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>