feat(3-K): real geocoder backends + producer-doc reframe + consumer-doc enrichment

Second of three PRs for v0.5.0 (J shipped the framework; this fills in real
backends + documents the reframed design principle in-tree; L is the events
tab + map fix, then tag).

Backends (all satisfy GeocoderBackend; never raise, all-null on any failure):
- NaviBackend — composed Navi /api/reverse/<lat>/<lon> (name/address + timezone
  + landclass + elevation in one call). Near-passthrough: response already
  matches the canonical 9-field shape. Best-effort warmup ping (Boise) on
  construction when a loop is running; config `headers` slot for a future
  Authorization: Bearer (config-only, no code change). Default base_url
  http://192.168.1.130:8440.
- PhotonBackend — raw Photon /reverse?lat&lon&limit=1 (name/address only).
  Maps features[0].properties; postal_code <- postcode; timezone/landclass/
  elevation_m null (Navi-composed-endpoint extras).
- NominatimBackend — OSM Nominatim /reverse?format=jsonv2 (name/address only).
  Configurable rate limit (default 1/sec; 0 disables for self-hosted) +
  required User-Agent. Maps the address block; landclass/elevation_m/timezone
  null.

Registered all three in supervisor _BACKEND_REGISTRY (resolved by EnrichmentConfig
backend_class name).

Docs — design pivot now in-tree:
- PRODUCER §2 reframed: the verbatim Matt quote stays; the translation inverts.
  Central is the consumer's only data plane (consumers can't do follow-up
  lookups), so enrich deliberately and centrally, namespaced under _enriched,
  failing to null. "No enrichment" is gone.
- PRODUCER §10.1 inverted: enrichment is expected; the anti-pattern is doing it
  OUTSIDE the framework (inline in poll(), bypassing cache + _enriched
  namespacing + the never-raise safety net).
- PRODUCER new §13 Enrichment contract: Enricher / GeocoderEnricher /
  GeocoderBackend Protocols, NoOpBackend default, sqlite cache + TTL +
  cache-all-null + don't-cache-on-raise semantics, _enriched.<name> provenance,
  per-field coverage matrix (cross-checked against GEOCODER_FIELDS), and the
  landclass antimeridian known wrinkle.
- CONSUMER FIRMS section: documents the data._enriched.geocoder bundle (9
  fields), per-region coverage (US-full, non-US timezone+elevation), and the
  antimeridian landclass caveat.

Tests:
- test_navi/photon/nominatim_backend.py — happy-path field mapping, null
  handling, extra-key drop, network/timeout/non-200/malformed -> all-null
  (never raises), Nominatim rate-limit (disabled + spacing) + User-Agent.
  Env-gated live Navi smoke (NAVI_INTEGRATION_TEST=1; skipped by default — the
  192.168.1.130 endpoint isn't reachable from CT104's segment).
- test_producer_doc.py — +4: §2 verbatim quote present, §10.1 subsection exists,
  §13 names all four protocol types, §13 coverage matrix == GEOCODER_FIELDS
  (derived from code, not hardcoded).

Verification: full pytest 525 passed, 1 skipped (was 495; +30 backend +
4 doc tests, -1 the env-gated skip). grep subject_for_event/_ADAPTER_REGISTRY
clean. All three backends import + resolve via the registry.

Flagged for later (NOT done here): adapters besides FIRMS that should declare
enrichment_locations (nwis, eonet, gdacs, usgs_quake, wfigs_*) — that's PR L
scope alongside the events tab. See PR description.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Matt Johnson 2026-05-20 16:10:44 +00:00
commit 98b050b2af
11 changed files with 833 additions and 37 deletions

View file

@ -23,8 +23,14 @@ from pathlib import Path
from central.adapter import SourceAdapter
from central.adapter_discovery import discover_adapters
from central.enrichment.geocoder import GEOCODER_FIELDS
from central.streams import STREAMS
# The verbatim design-principle quote that must stay in §2 (Matt, 2026-05-19).
_DESIGN_PRINCIPLE_QUOTE = (
"Central takes it all and gives it all. It's up to the pipe to do with it"
)
DOC_PATH = Path(__file__).resolve().parents[1] / "docs" / "PRODUCER-INTEGRATION.md"
@ -186,6 +192,48 @@ def test_streams_snippet_quotes_live_registry():
)
def _section(doc: str, header_re: str) -> str:
"""Return the body of the section whose header matches header_re, up to the
next same-or-higher-level header."""
m = re.search(header_re + r"\s*\n(.*?)(?=^## |\Z)", doc, re.DOTALL | re.MULTILINE)
assert m, f"doc missing section matching {header_re!r}"
return m.group(1)
def test_design_principle_quote_present_in_section_2():
"""§2 must still carry the verbatim Matt quote — the reframe changes the
translation beneath it, not the quote itself."""
section = _section(_doc_text(), r"^## 2\. The design principle")
assert _DESIGN_PRINCIPLE_QUOTE in section, "verbatim design-principle quote missing from §2"
def test_anti_pattern_10_1_section_exists():
"""§10.1 must still exist as a subsection (content reframed to
'enrichment outside the framework', structure preserved)."""
doc = _doc_text()
assert re.search(r"^### 10\.1 ", doc, re.MULTILINE), "doc missing '### 10.1' subsection"
def test_enrichment_contract_section_13_has_all_protocol_references():
"""New §13 must name all four enrichment contract types verbatim."""
section = _section(_doc_text(), r"^## 13\. Enrichment contract")
for ref in ("Enricher", "GeocoderEnricher", "GeocoderBackend", "NoOpBackend"):
assert ref in section, f"§13 missing reference to {ref!r}"
def test_enrichment_coverage_matrix_lists_exactly_geocoder_fields():
"""The §13 per-field coverage matrix must list exactly the canonical
GEOCODER_FIELDS derived from code, not hardcoded here."""
section = _section(_doc_text(), r"^## 13\. Enrichment contract")
# Matrix rows look like: | `field_name` | ... |
row_fields = set(re.findall(r"^\|\s*`([a-z_]+)`\s*\|", section, re.MULTILINE))
assert row_fields == set(GEOCODER_FIELDS), (
f"coverage-matrix field drift: "
f"doc-only={row_fields - set(GEOCODER_FIELDS)}, "
f"code-only={set(GEOCODER_FIELDS) - row_fields}"
)
def test_no_orphan_adapter_references_in_anti_patterns():
"""Anti-patterns section names two real adapter modules as examples
(firms, inciweb in §10.4). Those names must still resolve via